Wikitech labswiki https://wikitech.wikimedia.org/wiki/Main_Page MediaWiki 1.44.0-wmf.2 first-letter Media Special Talk User User talk Wikitech Wikitech talk File File talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk Obsolete Obsolete talk OfficeIT OfficeIT talk Tool Tool talk Nova Resource Nova Resource Talk Heira Heira Talk TimedText TimedText talk Module Module talk User:Hex 2 2904 2243199 11302 2024-11-10T13:14:26Z Zabe 21763 Zabe moved page [[User:Hex]] to [[User:Scott]] without leaving a redirect: Automatically moved page while renaming the user "[[User:Hex|Hex]]" to "[[User:Scott|Scott]]" 11302 wikitext text/x-wiki Hello, I'm [[:en:User:Hex|Hex]] from the English Wikipedia. t7tjiw9j4ebrl1t427krcn61p0tob5e 2243203 2243199 2024-11-10T18:13:36Z DreamRimmer 37491 DreamRimmer moved page [[User:Scott]] to [[User:Hex]]: Automatically moved page while renaming the user "[[Special:CentralAuth/Scott|Scott]]" to "[[Special:CentralAuth/Hex|Hex]]" 11302 wikitext text/x-wiki Hello, I'm [[:en:User:Hex|Hex]] from the English Wikipedia. t7tjiw9j4ebrl1t427krcn61p0tob5e Deployments 0 4108 2243205 2243179 2024-11-10T18:22:47Z ScheduleDeploymentBot 37566 Add [[gerrit:1089182]] to Monday, November 11 UTC morning backport window 2243205 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the {{Clickable button 2|backport scheduling tool|url=https://schedule-deployment.toolforge.org/}} to more easily edit this page. * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if they are likely to affect HTTP caching, introduce new cookies, or utilize new database tables. ** '''Announce''' deployments of major features to the community via [[meta:Tech/News/Next|Tech News]] and/or via other [[mediawikiwiki:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of November 11== ==={{Deployment_day|date=2024-11-10}}=== {{Deployment calendar event card |when=2024-11-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-11-11}}=== {{Deployment calendar event card |when=2024-11-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|Hamishcz|Hamish}} {{deploy|type=config|gerrit=1089182|title=Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki|status=}} - {{phabricator|T379500}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|ZhaoFJx|ZhaoFJx}} {{deploy|type=config|gerrit=1078764|title=zhwiki: Allow event-organizer self remove usergroup|status=}} - {{phabricator|T376061}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-11-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-11-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|tgr|Gergő}} {{deploy|type=1.44.0-wmf.2|gerrit=1088770|title=Fix warning about missing central account for temp users|status=}} - {{phabricator|T378289}} {{deploy|type=1.44.0-wmf.2|gerrit=1088771|title=Check session provider when autocreating|status=}} - {{phabricator|T378289}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-11-11 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.3</code> }} {{Deployment calendar event card |when=2024-11-11 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.3</code> to testwikis }} {{Deployment calendar event card |when=2024-11-11 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-11-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-12}}=== {{Deployment calendar event card |when=2024-11-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-11-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-12 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2|1.44.0-wmf.2}} * group0 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-13}}=== {{Deployment calendar event card |when=2024-11-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2}} * group1 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-11-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2}} * group1 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-14}}=== {{Deployment calendar event card |when=2024-11-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3}} * group2 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 08:00 SF |length=1 |window=Train log triage |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-11-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-14 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-11-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-14 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3}} * group2 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-15}}=== {{Deployment calendar event card |when=2024-11-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-11-15 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-11-16}}=== {{Deployment calendar event card |when=2024-11-16 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of November 18== ==={{Deployment_day|date=2024-11-17}}=== {{Deployment calendar event card |when=2024-11-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-11-18}}=== {{Deployment calendar event card |when=2024-11-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-11-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-11-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-11-18 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.4</code> }} {{Deployment calendar event card |when=2024-11-18 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.4</code> to testwikis }} {{Deployment calendar event card |when=2024-11-18 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-11-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-19}}=== {{Deployment calendar event card |when=2024-11-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3|1.44.0-wmf.3}} * group0 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-11-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-19 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3|1.44.0-wmf.3}} * group0 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-20}}=== {{Deployment calendar event card |when=2024-11-20 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3}} * group1 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-20 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-11-20 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-20 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3}} * group1 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-20 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-20 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-21}}=== {{Deployment calendar event card |when=2024-11-21 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4}} * group2 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-21 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-21 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-21 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-11-21 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-21 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-11-21 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-21 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4}} * group2 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-21 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-22}}=== {{Deployment calendar event card |when=2024-11-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-11-22 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-11-23}}=== {{Deployment calendar event card |when=2024-11-23 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} 9qvf92td8s2y0l3fo8g2c8q8zb9rqne 2243249 2243205 2024-11-11T11:04:39Z ScheduleDeploymentBot 37566 Add [[gerrit:1082174]] to Monday, November 11 UTC late backport window 2243249 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the {{Clickable button 2|backport scheduling tool|url=https://schedule-deployment.toolforge.org/}} to more easily edit this page. * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if they are likely to affect HTTP caching, introduce new cookies, or utilize new database tables. ** '''Announce''' deployments of major features to the community via [[meta:Tech/News/Next|Tech News]] and/or via other [[mediawikiwiki:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of November 11== ==={{Deployment_day|date=2024-11-10}}=== {{Deployment calendar event card |when=2024-11-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-11-11}}=== {{Deployment calendar event card |when=2024-11-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|Hamishcz|Hamish}} {{deploy|type=config|gerrit=1089182|title=Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki|status=}} - {{phabricator|T379500}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|ZhaoFJx|ZhaoFJx}} {{deploy|type=config|gerrit=1078764|title=zhwiki: Allow event-organizer self remove usergroup|status=}} - {{phabricator|T376061}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-11-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-11-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|tgr|Gergő}} {{deploy|type=1.44.0-wmf.2|gerrit=1088770|title=Fix warning about missing central account for temp users|status=}} - {{phabricator|T378289}} {{deploy|type=1.44.0-wmf.2|gerrit=1088771|title=Check session provider when autocreating|status=}} - {{phabricator|T378289}} {{ircnick|Ammar|Ammar}} {{deploy|type=config|gerrit=1082174|title=contactpages: Update Affcom UserGroup application form|status=}} - {{phabricator|T375392}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-11 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-11-11 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.3</code> }} {{Deployment calendar event card |when=2024-11-11 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.3</code> to testwikis }} {{Deployment calendar event card |when=2024-11-11 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-11-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-12}}=== {{Deployment calendar event card |when=2024-11-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-11-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-12 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2|1.44.0-wmf.2}} * group0 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-13}}=== {{Deployment calendar event card |when=2024-11-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2}} * group1 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-11-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3|1.44.0-wmf.2}} * group1 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-13 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-14}}=== {{Deployment calendar event card |when=2024-11-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3}} * group2 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 08:00 SF |length=1 |window=Train log triage |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-11-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-14 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-11-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-14 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3|1.44.0-wmf.3|1.44.0-wmf.2->1.44.0-wmf.3}} * group2 to [[mw:MediaWiki_1.44/wmf.3|1.44.0-wmf.3]] * '''Blockers: {{phabricator|T375662}}''' }} {{Deployment calendar event card |when=2024-11-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-15}}=== {{Deployment calendar event card |when=2024-11-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-11-15 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-11-16}}=== {{Deployment calendar event card |when=2024-11-16 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of November 18== ==={{Deployment_day|date=2024-11-17}}=== {{Deployment calendar event card |when=2024-11-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-11-18}}=== {{Deployment calendar event card |when=2024-11-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-11-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-11-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-18 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-11-18 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.4</code> }} {{Deployment calendar event card |when=2024-11-18 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.4</code> to testwikis }} {{Deployment calendar event card |when=2024-11-18 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-11-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-19}}=== {{Deployment calendar event card |when=2024-11-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3|1.44.0-wmf.3}} * group0 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-11-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-19 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3|1.44.0-wmf.3}} * group0 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-20}}=== {{Deployment calendar event card |when=2024-11-20 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3}} * group1 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-20 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-11-20 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-20 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4|1.44.0-wmf.3}} * group1 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-20 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-20 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-11-20 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-20 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-11-21}}=== {{Deployment calendar event card |when=2024-11-21 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4}} * group2 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-21 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-21 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-11-21 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-11-21 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-11-21 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-11-21 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-11-21 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.4|1.44.0-wmf.4|1.44.0-wmf.3->1.44.0-wmf.4}} * group2 to [[mw:MediaWiki_1.44/wmf.4|1.44.0-wmf.4]] * '''Blockers: {{phabricator|T375663}}''' }} {{Deployment calendar event card |when=2024-11-21 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-11-21 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-11-22}}=== {{Deployment calendar event card |when=2024-11-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-11-22 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-11-23}}=== {{Deployment calendar event card |when=2024-11-23 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} je11wm7l4h9nv36ex6mbfeuq3p8xwym Obsolete:Yahoo cluster 110 4347 2243230 339591 2024-11-11T08:27:52Z Revi C. 1919 added [[Category:Data centers]] using [[Help:Gadget-HotCat|HotCat]] 2243230 wikitext text/x-wiki '''Yaseo''' was a set of Wikimedia machines hosted in Yahoo!'s data centre in Seoul, South Korea. == See also == * [[Obsolete:Yaseo migration plan|Yaseo migration plan]] [[Category:Data centers]] 753wi4hq3gcxekkb0hgem4tykef9m12 2243231 2243230 2024-11-11T08:31:06Z Revi C. 1919 2006 obsolete, from that migration plan 2243231 wikitext text/x-wiki {{Obsolete|year=2006}} '''Yaseo''' was a set of Wikimedia machines hosted in Yahoo!'s data centre in Seoul, South Korea. == See also == * [[Obsolete:Yaseo migration plan|Yaseo migration plan]] [[Category:Data centers]] kqwwbts4zfesc0m05pujibt4vic8t2l Nova Resource:Math/SAL 498 5476 2243207 2196322 2024-11-10T18:31:29Z Stashbot 7414 physikerwelt: test 2243207 wikitext text/x-wiki === 2024-11-10 === * 18:31 physikerwelt: test === 2024-06-20 === * 09:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) * 09:38 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs === 2024-01-09 === * 03:51 andrewbogott: raised quota to gigabytes=260 for [[phab:T354579|T354579]] === 2020-01-14 === * 02:20 andrewbogott: rebooting mathosphere to resolve hangs associated with an old NFS failure === 2019-03-14 === * 23:13 bd808: Deleted drmf, math-ru, mathoid2 ([[phab:T204509|T204509]]) === 2019-03-01 === * 10:39 arturo: shut down drmft instance again (was active by mistake due to workload reallocation) Will be deleted soon anyway === 2019-02-05 === * 17:23 arturo: make myself projectadmin for better view of internal project logs * 17:18 arturo: [[phab:T204509|T204509]] shutting down again drmf-beta and drmf === 2019-01-31 === * 12:05 arturo: VM instances drmf,drmf-beta,math-docker, were stopped briefly due to issue in hypervisor ([[phab:T215012|T215012]]) === 2019-01-21 === * 16:22 arturo: [[phab:T204509|T204509]] shutdown ubuntu *VM instances*: drmf-beta, drmf, mathoid2, math-ru * 16:21 arturo: [[phab:T204509|T204509]] shutdown ubuntu iamges: drmf-beta, drmf, mathoid2, math-ru === 2018-01-29 === * 16:16 andrewbogott: rehabilitating 'drmf' instance === 2018-01-09 === * 02:17 andrewbogott: rebooting drmf in an attempt to get puppet working === 2016-05-05 === * 22:58 bd808: Joined project as admin === March 4 === * 21:35 andrewbogott: (and also because Howie requested it) * 21:34 andrewbogott: moved http://drmf-beta.wmflabs.org to point to the drmf-beta instance, and http://drmf.wmflabs.org to point to the drmf instance. Because previously it was the other way around which was super confusing. === September 16 === * 19:36 andrewbogott: moving and rebooting mws instance === January 17 === * 08:57 andrewbogott: moving math-semantics to a new virt host to avoid a storage crunch. This will reboot the instance. === January 15 === * 08:48 andrewbogott: rebooted latexml-test === August 29 === * 06:05 Ryan_Lane: adding jiabao to work on math support for visualeditor {{SAL|Project Name=math}} <noinclude>[[Category:SAL]]</noinclude> 5x5yjfzfv6q6l5k41ckxvdpfx5qmx0s Server Admin Log 0 7919 2243195 2243168 2024-11-10T12:25:34Z Stashbot 7414 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json 2243195 wikitext text/x-wiki == 2024-11-10 == * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> naswxuwj8rp0mut1kb8cd16cbamk920 2243196 2243195 2024-11-10T12:26:13Z Stashbot 7414 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index 2243196 wikitext text/x-wiki == 2024-11-10 == * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> oso3vta4jc939gf3hegtonts723d2qp 2243197 2243196 2024-11-10T12:26:26Z Stashbot 7414 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index 2243197 wikitext text/x-wiki == 2024-11-10 == * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> cwiqc7ow7huwdvouq3mtwkl2l6fupvv 2243198 2243197 2024-11-10T12:32:16Z Stashbot 7414 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) 2243198 wikitext text/x-wiki == 2024-11-10 == * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 5b7loi033eqyp0f1brp16uaa9kdbwiv 2243209 2243198 2024-11-10T22:29:38Z Stashbot 7414 jhathaway: re-imaging ms-be2082 to test efi boot order 2243209 wikitext text/x-wiki == 2024-11-10 == * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> j6kx3k3an3izczoyndogxk4q53ayt4u 2243210 2243209 2024-11-10T22:51:28Z Stashbot 7414 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye 2243210 wikitext text/x-wiki == 2024-11-10 == * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 2d68b9ngf9b5424tjesgvej00uz98gx 2243211 2243210 2024-11-10T23:14:49Z Stashbot 7414 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage 2243211 wikitext text/x-wiki == 2024-11-10 == * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> r88doa9xpthhhxoy1vf4xx77mom9w9c 2243212 2243211 2024-11-10T23:17:37Z Stashbot 7414 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage 2243212 wikitext text/x-wiki == 2024-11-10 == * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> nev07azep8m4ftae4tq9lyxx1xeu2am 2243214 2243212 2024-11-10T23:43:54Z Stashbot 7414 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye 2243214 wikitext text/x-wiki == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 2vplzw5772copn83z90khcutr1umjlt 2243218 2243214 2024-11-11T07:15:14Z Stashbot 7414 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . 2243218 wikitext text/x-wiki == 2024-11-11 == * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> qipbegnwpkjdno5nmb0o0pxnk3sn0si 2243219 2243218 2024-11-11T07:49:13Z Stashbot 7414 _joe_: installing conftool 4.1.0 on puppetservers 2243219 wikitext text/x-wiki == 2024-11-11 == * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> b8f2nrfaoyje44tel9948nzo6ztxfkj 2243220 2243219 2024-11-11T07:51:00Z Stashbot 7414 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet 2243220 wikitext text/x-wiki == 2024-11-11 == * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 8lyyrzztvsge6peuabqnfkd82c3qmsj 2243221 2243220 2024-11-11T08:11:16Z Stashbot 7414 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182|Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] 2243221 wikitext text/x-wiki == 2024-11-11 == * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> nc89bz6izucpepjt315u3j2ittk99lr 2243224 2243221 2024-11-11T08:17:53Z Stashbot 7414 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" 2243224 wikitext text/x-wiki == 2024-11-11 == * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 99yd0d3p3e9aprjv4zuhy11r69b6xyy 2243225 2243224 2024-11-11T08:17:55Z Stashbot 7414 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 2243225 wikitext text/x-wiki == 2024-11-11 == * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> ltut7xgctdm9o4fdihq99sye050jyl6 2243226 2243225 2024-11-11T08:18:29Z Stashbot 7414 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 2243226 wikitext text/x-wiki == 2024-11-11 == * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> l81wtmdjpn7q8n7mmenu5ua7c8c2mel 2243227 2243226 2024-11-11T08:18:30Z Stashbot 7414 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" 2243227 wikitext text/x-wiki == 2024-11-11 == * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> gs335rzvr9dzuhew4k2o6jq6l6wu1h9 2243228 2243227 2024-11-11T08:22:47Z Stashbot 7414 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182|Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) 2243228 wikitext text/x-wiki == 2024-11-11 == * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 57pa1mezk1s3rb6lnl9uslehwh4i2im 2243229 2243228 2024-11-11T08:24:08Z Stashbot 7414 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync 2243229 wikitext text/x-wiki == 2024-11-11 == * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> dm4bh990a9omqteq29x8tvmi7jjq2eb 2243232 2243229 2024-11-11T08:32:15Z Stashbot 7414 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182|Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) 2243232 wikitext text/x-wiki == 2024-11-11 == * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> s3962r9n6wrknk16d2xw9ddcdwp5ruk 2243233 2243232 2024-11-11T08:32:52Z Stashbot 7414 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628|Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559|Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] 2243233 wikitext text/x-wiki == 2024-11-11 == * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> rzv7vggwr239pzzrh1u2mx1p7oxfkws 2243234 2243233 2024-11-11T08:35:03Z Stashbot 7414 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628|Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559|Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) 2243234 wikitext text/x-wiki == 2024-11-11 == * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> cx09jako2utj0h1hfawcdb862y0eumx 2243235 2243234 2024-11-11T08:35:27Z Stashbot 7414 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync 2243235 wikitext text/x-wiki == 2024-11-11 == * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> knepkfhto10sm8uuyi37im992iy1u1u 2243236 2243235 2024-11-11T08:40:07Z Stashbot 7414 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628|Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559|Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) 2243236 wikitext text/x-wiki == 2024-11-11 == * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> h7p7nl7gkz5ctolq6llpl35uz5k16dr 2243237 2243236 2024-11-11T09:02:19Z Stashbot 7414 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet 2243237 wikitext text/x-wiki == 2024-11-11 == * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 494pwr7mdx01zhyt3dokht7tugxqfpd 2243238 2243237 2024-11-11T09:10:13Z Stashbot 7414 moritzm: remove ganeti1011 from active ganeti nodes T378921 2243238 wikitext text/x-wiki == 2024-11-11 == * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 1pumhrqk4ty90uh0xj4d175eu758xzj 2243239 2243238 2024-11-11T10:00:42Z Stashbot 7414 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" 2243239 wikitext text/x-wiki == 2024-11-11 == * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> okb25gu1bdqia1kjyeu5d10l07li2ki 2243240 2243239 2024-11-11T10:00:44Z Stashbot 7414 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 2243240 wikitext text/x-wiki == 2024-11-11 == * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> aaam68q7slmp7v2rwn20bwfaxlbppgb 2243241 2243240 2024-11-11T10:01:15Z Stashbot 7414 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 2243241 wikitext text/x-wiki == 2024-11-11 == * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 8u5eal29ec0u80ymw7n060sbll03yhp 2243242 2243241 2024-11-11T10:01:18Z Stashbot 7414 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" 2243242 wikitext text/x-wiki == 2024-11-11 == * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> afxv40i2wz43rdgfvr36hrybv6shkoi 2243245 2243242 2024-11-11T10:55:17Z Stashbot 7414 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . 2243245 wikitext text/x-wiki == 2024-11-11 == * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> b139rufgf7c0ho5k8buizru7qrnax22 2243247 2243245 2024-11-11T10:57:39Z Stashbot 7414 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views 2243247 wikitext text/x-wiki == 2024-11-11 == * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> q321l8tj6h2m2m03gqr9g4kyeuhmeo5 2243250 2243247 2024-11-11T11:04:45Z Stashbot 7414 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) 2243250 wikitext text/x-wiki == 2024-11-11 == * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3mb3x9cupck8ehgabvvz5awror0wq26 2243252 2243250 2024-11-11T11:06:24Z Stashbot 7414 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . 2243252 wikitext text/x-wiki == 2024-11-11 == * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> fsn90kjxgimovfxwq6pkv9qf9nwaqc1 2243255 2243252 2024-11-11T11:30:05Z Stashbot 7414 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . 2243255 wikitext text/x-wiki == 2024-11-11 == * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 9sp56qbehtc12t2szqqjtcck4l1va69 2243256 2243255 2024-11-11T11:43:34Z Stashbot 7414 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye 2243256 wikitext text/x-wiki == 2024-11-11 == * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> gdr53pinmjs5cgp2sw30ri24d4hoaob 2243257 2243256 2024-11-11T11:43:40Z Stashbot 7414 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye 2243257 wikitext text/x-wiki == 2024-11-11 == * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> avd64iqv6tvcpapnay7820wa5rg13l2 2243258 2243257 2024-11-11T11:43:47Z Stashbot 7414 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet 2243258 wikitext text/x-wiki == 2024-11-11 == * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> g6lk36222q2gxgen5ete5yk315m9l9c 2243259 2243258 2024-11-11T11:44:22Z Stashbot 7414 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye 2243259 wikitext text/x-wiki == 2024-11-11 == * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> asc52lhoo4w5fh8br3klvrahgp12js2 2243260 2243259 2024-11-11T11:46:21Z Stashbot 7414 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch 2243260 wikitext text/x-wiki == 2024-11-11 == * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> gny7yhj3ctf5qxojmti7t0ebiwpnikg 2243261 2243260 2024-11-11T11:54:06Z Stashbot 7414 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch 2243261 wikitext text/x-wiki == 2024-11-11 == * 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> jqsqui4te1juntwh0ttd2r52wgumsfr 2243262 2243261 2024-11-11T11:56:06Z Stashbot 7414 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet 2243262 wikitext text/x-wiki == 2024-11-11 == * 11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet * 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> jnfi89v5v10qeswz9v2j7jjzjr6hgxl 2243263 2243262 2024-11-11T11:56:49Z Stashbot 7414 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage 2243263 wikitext text/x-wiki == 2024-11-11 == * 11:56 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet * 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> sjl5x0e67xx00q8iw2pgaxznldh1h1y Release Engineering/SAL 0 17290 2243208 2243181 2024-11-10T22:13:51Z Stashbot 7414 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1089252 2243208 wikitext text/x-wiki == 2024-11-10 == * 22:13 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1089252 * 00:50 Reedy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1088802 == 2024-11-08 == * 20:21 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1088627 [[phab:T379145|T379145]] == 2024-11-06 == * 18:46 brennen: creating https://gitlab.wikimedia.org/repos/wmf-packages per [[phab:T367322|T367322]], [[phab:T379187|T379187]] * 10:07 Lucas_WMDE: lucaswerkmeister-wmde@integration-castor05:~$ mv /srv/castor/castor-mw-ext-and-skins/master/mwext-node18-rundoc/_cacache<nowiki>{</nowiki>,-nuked-2024-11-06-delete-later<nowiki>}</nowiki>/ == 2024-11-05 == * 18:06 James_F: Zuul: [mediawiki/extensions/EmailNotifications] Add basic CI == 2024-11-04 == * 15:08 James_F: Zuul: Add TechieNK to trusted users * 15:06 James_F: Zuul: [mediawiki/extensions/AutoModerator] Add DiscussionTools dependency * 15:04 James_F: Zuul: [mediawiki/extensions/WhoIsWatching] Add Echo phan dependency * 15:04 James_F: Zuul: [mediawiki/extensions/ArrayFunctions] Add Scribunto phan dependency * 15:02 James_F: Zuul: [mediawiki/extensions/ExternalData] Add Scribunto phan dependency * 15:02 James_F: Zuul: [mediawiki/extensions/EntitySchema] Add WikibaseCirrusSearch phan dep, for [[phab:T376250|T376250]] * 11:42 hashar: Restarted CI Jenkins to update the Collapsible Sections plugin due to [[phab:T378864|T378864]] # [[phab:T378327|T378327]] == 2024-11-02 == * 13:15 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1085943 == 2024-10-31 == * 21:43 bd808: set `profile::puppetserver::swift_fetch_rings::active_server: deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud` in prefix puppet for `deployment-puppetserver` to fix puppet run on deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud * 21:36 bd808: bd808 got puppet ssl certs fixed on deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud * 20:31 bd808: bd808 messed up puppet certs on deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud with an unplanned `sudo rm -rf /var/lib/puppet/ssl` and is now trying to figure out how to recover * 20:06 hashar: Restarted CI Jenkins to update the collapsible section plugin {{!}} https://github.com/jenkinsci/collapsing-console-sections-plugin/pull/35 {{!}} [[phab:T378327|T378327]] * 17:31 bd808: Created deployment-mediawiki81.deployment-prep.eqiad1.wikimedia.cloud ([[phab:T378752|T378752]]) * 15:28 hashar: Building Docker images for Quibble 1.11.0 * 15:13 hashar: Tag Quibble 1.11.0 @ {{Gerrit|85874bf22de241cf23074515fd2aa7a57878e57e}} # [[phab:T370380|T370380]], [[phab:T376602|T376602]] == 2024-10-30 == * 17:31 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1084808 "zuul: fix finding Zuul config dir" == 2024-10-29 == * 21:50 James_F: Docker: Update PHP 8.2 and 8.3 to latest in images * 21:17 James_F: Zuul: Provide experimental PHP 8.4 composer jobs where matched * 21:08 James_F: Docker: Provide PHP 8.4 composer images * 20:41 James_F: Zuul: Add php-compile-php84 experimental job to php-compile-php74-or-later * 19:31 James_F: Docker: Provide PHP 8.4 images based on 8.4.0-rc1; upgrade php-ast to v1.1.2 * 17:48 James_F: Zuul: [mediawiki/extensions/CollabPads] Add VisualEditor Dependency * 17:48 James_F: Zuul: [mediawiki/extensions/AIEditingAssistant] Add VisualEditor Dependency * 17:43 James_F: Zuul: [mediawiki/extensions/CheckUser] Add GlobalPreferences dependency, for [[phab:T377831|T377831]] * 17:42 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add GlobalPreferences dependency, for [[phab:T375508|T375508]] == 2024-10-23 == * 16:18 hashar: deployment-prep: sudo rm /var/lock/scap* * 16:18 hashar: deployment-prep: scap lock --unlock-all "Release all locks after Jenkins got restarted and SIGKILL deployment" * 16:17 hashar: deployment-prep: sudo rm /var/lock/scap-global-lock # it is an empty file, scap now expects a json payload == 2024-10-21 == * 09:42 hashar: Restarted CI Jenkins for plugins updates == 2024-10-19 == * 02:35 Krinkle: Disable publishing for extension-ApiFeatureUsage, ref [[phab:T143162|T143162]], [[phab:T313731|T313731]] == 2024-10-18 == * 14:33 hashar: zuul enqueue --trigger gerrit --pipeline postmerge --change {{Gerrit|1080409}},5 --project mediawiki/tools/phan/SecurityCheckPlugin # For @Daimona and [[phab:T372887|T372887]] == 2024-10-16 == * 17:25 greg-g: removed 2fa from JMando (confirmed in video call) == 2024-10-15 == * 22:02 James_F: Zuul: Archive the StickyTOC extension, for [[phab:T374778|T374778]] * 08:39 codders: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1076148 ([[phab:T50217|T50217]], [[phab:T377176|T377176]]) == 2024-10-11 == * 14:04 hashar: integration: force refresh the git mirrored repositories from integration-cumin: `sudo cumin -b1 'name:docker' 'systemctl start ci-gitcache-refresh.service'` # [[phab:T376981|T376981]] == 2024-10-10 == * 21:07 mutante: gerrit - removed all members from all WMDE gerrit groups listed on [[phab:T376951|T376951]] * 14:00 hashar: gerrit: reindex all changes (`gerrit index start changes --force`) * 12:37 James_F: Zuul: Sync'ed to contint.wikimedia.org to push new jjb patches to machine, already live on jenkins itself. == 2024-10-09 == * 23:29 bd808: Setup integration-agent-docker Puppet prefix and removed instance Puppet configuration for all integration-agent-docker-* instances. * 14:03 hashar: Updating MediaWiki CI jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1078628 == 2024-10-08 == * 08:30 hashar: gerrit: closed all connections for user suecarmol due to [[phab:T371749|T371749]] == 2024-10-06 == * 20:51 James_F: Docker: Re-building PHP images with composer v2.8.1, for [[phab:T376409|T376409]], and PHP 8.1 images with PHP 8.1.30-1+wmf11u1 == 2024-10-04 == * 14:03 hashar: Restarted zuul-merger on contint1002 * 13:44 hashar: gerrit: created community-tech group self owned, with initial member being TheresNoTime https://gerrit.wikimedia.org/r/admin/groups/f16e0c7dfb9aa1f4d78d367ab68a290c607df853 == 2024-10-02 == * 20:01 bd808: Rebooting deployment-mwmaint03.deployment-prep.eqiad1.wikimedia.cloud ([[phab:T376336|T376336]]) * 17:01 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=sectionlevelimages growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:55 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=linkrecommendation growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:49 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=imagerecommendation growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:43 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=E growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:36 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=D growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:33 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=C growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:25 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=B growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:19 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=A growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 16:12 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant` ([[phab:T374544|T374544]]) * 12:04 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=oldimpact growthexperiments-homepage-variant` ([[phab:T350077|T350077]]) == 2024-09-28 == * 13:33 hashar: Reloaded Zuul to deploy: Add phan dependency on AdminLinks for CloneDiff # https://gerrit.wikimedia.org/r/1076294 == 2024-09-27 == * 16:24 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MobileFrontend dependency for now, for [[phab:T375894|T375894]] == 2024-09-25 == * 23:46 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] +MobileFrontend test & phan dep == 2024-09-24 == * 20:14 James_F: Zuul: [mediawiki/extensions/Wikisource] Fix typo in phan deps * 19:58 James_F: Zuul: Split out deps and phan_deps to YAML files, for [[phab:T375541|T375541]] * 17:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1075274 * 17:17 Reedy: Reloading Zuul to deploy https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/514591/ * 16:53 James_F: Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs, for [[phab:T50217|T50217]] == 2024-09-23 == * 20:48 James_F: Zuul: [mediawiki/extensions/ParserFunctions] Remove master tests for LTS * 19:51 brennen: restarted phabricator-bullseye in devtools == 2024-09-20 == * 16:54 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1074426 == 2024-09-19 == * 09:03 hashar: Cleared deprecated approvals from CI Jenkins # [[phab:T375160|T375160]] * 09:02 hashar: CI Jenkins: approved 3 scripts we wrote which were pending approval at https://integration.wikimedia.org/ci/manage/scriptApproval/ # [[phab:T375160|T375160]] * 09:01 hashar: CI Jenkins: approved 3 scripts we wrote which were pending approval at https://integration.wikimedia.org/ci/manage/scriptApproval/ == 2024-09-18 == * 08:36 hashar: Updating Quibble Jenkins jobs to quibble 1.10.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1073542 {{!}} [[phab:T365976|T365976]] {{!}} [[phab:T50217|T50217]] == 2024-09-17 == * 15:57 hashar: Tag Quibble 1.10.0 @ {{Gerrit|2dd562aa75aca65413d009e3e382a0a67ed328d7}} # [[phab:T50217|T50217]] [[phab:T365976|T365976]] == 2024-09-12 == * 23:01 dduvall: deployed https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/79 and https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/80 to releases-jenkins == 2024-09-11 == * 23:11 James_F: Docker: [quibble-bullseye-php81] Re-build on Wikimedia-flavoured PHP, for [[phab:T372507|T372507]] * 22:58 James_F: Docker: [php81] Re-build based on Wikimedia-provided binaries, for [[phab:T372507|T372507]] * 20:19 James_F: Zuul: [mediawiki/extensions/CommunityConfigurationExample] Add CI, for [[phab:T373114|T373114]] == 2024-09-06 == * 14:21 James_F: jforrester@integration-castor05:/srv/castor$ sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node18-docs-publish/ # [[phab:T373937|T373937]] == 2024-09-05 == * 15:49 dancy: Updating scap to v4.101.0 in beta cluster * 14:46 James_F: sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WikifunctionsClient/ # [[phab:T374060|T374060]] == 2024-09-03 == * 20:16 jeena: Updating development images on contint primary for https://phabricator.wikimedia.org/T373721 * 11:48 hashar: Removing Maven Release plugin from CI Jenkins ( https://plugins.jenkins.io/m2release/ ). Was used until 2020 by analytics-refinery-release / wikidata-query-rdf-release-silent which instead went to use Docker images == 2024-09-02 == * 11:05 hashar: Restarted zuul-merger on contint2002: it had not been processing events since June 4th!!! == 2024-08-29 == * 12:08 zabe: BETA: zabe@deployment-mwmaint03:~$ foreachwiki extensions/WikimediaMaintenance/migrateESRefToContentTable.php # [[phab:T183490|T183490]] == 2024-08-27 == * 16:37 James_F: Zuul: Configure the REL1_43 test and gate pipelines, for [[phab:T372317|T372317]] * 00:39 dduvall: deploying changes to jenkins-deploy (https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/67) == 2024-08-26 == * 08:59 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ which broke due to a flappy connection with Github == 2024-08-23 == * 20:36 dancy: `docker system prune` and `docker image prune -a` on `integration-agent-docker-1056` * 15:07 dancy: Update gitlab-cloud-runners ingress-nginx to v1.11.2 * 15:05 James_F: jforrester@doc1003:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/CongressLookup/ # [[phab:T371339|T371339]] * 14:36 James_F: Docker: Building upgraded Node base images with latest releases: v18.20.2 => v18.20.4, v20.12.2 => v20.16.0, v22.0.0 => v22.6.0 * 14:34 James_F: Zuul: [mediawiki/extensions/MadLib] Mark as archived, for [[phab:T369999|T369999]] * 14:33 James_F: Zuul: [mediawiki/extensions/GlobalContribs] Mark as archived, for [[phab:T157240|T157240]] * 14:32 James_F: Zuul: [mediawiki/extensions/EmailDiff] Mark as archived, for [[phab:T369986|T369986]] * 14:30 James_F: Zuul: [mediawiki/extensions/EmailDeletedPages] Mark as archived, for [[phab:T369990|T369990]] * 14:29 James_F: Zuul: [mediawiki/extensions/Checkpoint] Mark as archived, for [[phab:T369989|T369989]] * 14:24 James_F: Zuul: [mediawiki/extensions/CongressLookup] Mark as archived, for [[phab:T371339|T371339]] * 14:19 James_F: Zuul: [mediawiki/extensions/WikiLambda] Also depend on VE for tests == 2024-08-22 == * 23:26 Krinkle: Manually abort quibble-phpunit jobs for GrowthExperiments patches in gate-and-submit pipeline given their selenium build has already failed, so that other patches +2'ed can start their retry and get merged. Ref [[phab:T373157|T373157]] * 20:23 dancy: `sudo docker system prune` on `integration-agent-docker-1056` to release some disk space. * 14:43 dancy: Upgrading gitlab-cloud-runners to gitlab-runner v17.1.1 == 2024-08-21 == * 19:32 dancy: Removed `role::aptly::client` from `deployment-prep` project Puppet ([[phab:T373051|T373051]]) * 16:53 dancy: Upgraded scap to 4.99.0 in beta * 15:33 dancy: Attempting to add `role::deployment_server::kubernetes` role to deployment-deploy04.deployment-prep == 2024-08-19 == * 19:38 dancy: Updating buildkitd to v0.15.2 in gitlab-cloud-runners == 2024-08-16 == * 16:16 thcipriani: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1062701 == 2024-08-13 == * 23:59 bd808: Added BetaDevOpsBot as a service account with admin rights for OpenTofu automation tasks == 2024-08-12 == * 21:15 bd808: Added BryanDavis (self) as a project admin == 2024-08-10 == * 12:47 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1061145 == 2024-08-09 == * 13:33 zabe: delete deployment-db12 and deployment-db13 # [[phab:T358329|T358329]] * 07:52 James_F: Zuul: [mediawiki/extension/WikimediaMaintenance] Enable code coverage, for [[phab:T372107|T372107]] * 07:51 James_F: Zuul: [mediawiki/extensions/Automoderator] Add ORES as a dependency, for [[phab:T352769|T352769]] * 07:43 James_F: Zuul: Note that NetworkSession is now in Wikimedia prod, for [[phab:T355267|T355267]] * 07:38 James_F: Zuul: [mediawiki/extensions/ElectonPdfService] Add FlaggedRevs as dependency for [[phab:T370740|T370740]] == 2024-08-08 == * 16:15 Amir1: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1060886 * 07:12 hashar: Upgrading CI Jenkins # [[phab:T371976|T371976]] == 2024-08-07 == * 17:25 greg-g: added thcipriani as owner of jenkins@wikimedia.org google group, removed myself == 2024-08-06 == * 19:29 hashar: Deleted deployment-deploy03 agent from the CI Jenkins, that got replaced by deployment-deploy04 by thcipriani in July as part of migrating deployment-prep instances out of Buster * 19:26 brennen: extloc: experimenting with running from Procfile ([[phab:T365665|T365665]]) * 10:37 hashar: Updating Jenkins jobs to migrate ci-src-setup-simple to Bookworm {{!}} https://gerrit.wikimedia.org/r/1060078 {{!}} [[phab:T335765|T335765]] == 2024-08-01 == * 23:15 dancy: Re-armed keyholder on deployment-deploy04 * 22:57 dancy: Reboot-testing deployment-deploy04 * 20:27 dancy: Adding a volume to deployment-deploy04 to hold /srv * 18:46 James_F: Zuul: [mediawiki/extensions/Chart] Add JsonConfig as a phan dep too * 18:40 James_F: Zuul: Rename ruby 3.1 jobs to 2.7 [merged but not deployed?] * 18:39 James_F: Zuul: Add JsonConfig as a dependency for Chart == 2024-07-26 == * 11:59 Lucas_WMDE: stopped deployment-deploy03 again * 11:59 Lucas_WMDE: copied /root/secrets.txt from deployment-deploy03 to deployment-deploy04 * 11:57 Lucas_WMDE: temporarily starting deployment-deploy03 again so I can copy /root/secrets.txt (beta cluster logstash access) out of it == 2024-07-24 == * 16:02 Southparkfan: moved sessionstorage/kask from sessionstorage04 to sessionstorage06 [[phab:T370461|T370461]] == 2024-07-23 == * 16:55 Southparkfan: cancel kask maintenance, not going to perform switchover yet, see https://phabricator.wikimedia.org/T370461 * 16:11 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 16:09 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 16:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 16:07 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 16:05 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) * 16:05 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:55 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:54 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:52 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:50 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:48 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:41 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:39 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:36 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:34 Southparkfan: starting kask maintenance - [[phab:T370461|T370461]] * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:31 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:31 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:29 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:26 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:24 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:22 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:12 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:11 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:09 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 14:02 hashar: Reloaded Zuul to enable Sonar Codehalth on PropertySuggester, QuickSurveys, Quiz, Score # [[phab:T321837|T321837]] * 10:25 hashar: integration: nuked Castor cache /srv/castor/castor-mw-ext-and-skins/master/mwgate-node18 == 2024-07-22 == * 17:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 17:05 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 17:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 17:00 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:30 Southparkfan: remove deployment-push-notifications01 - [[phab:T370459|T370459]] * 15:11 Southparkfan: remove deployment-parsoid12 - [[phab:T361386|T361386]] * 15:10 Southparkfan: remove deployment-mwmaint02 [[phab:T370582|T370582]] * 15:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 15:00 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 15:00 Southparkfan: delete deployment-urldownloader03 [[phab:T370466|T370466]] * 14:57 Southparkfan: delete deployment-mediawiki11 and deployment-mediawiki12 (incl PuppetDB data + volumes) [[phab:T361387|T361387]] * 14:43 Southparkfan: fix /srv/git/operations/puppet yet again ([[phab:T364492|T364492]]) via chown -R gitpuppet:gitpuppet on .git/, then use 'pgit' (gitpuppet wrapper) to reset to oot branch * 14:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:08 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 14:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:04 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 13:47 James_F: Zuul: Archive wikimedia/irc/ircservserv and wikimedia/irc/ircservserv-config, for [[phab:T344744|T344744]] * 13:39 James_F: Zuul: Drop rust testing from last two repos, for [[phab:T355363|T355363]] == 2024-07-20 == * 15:52 Southparkfan: add deployment-sessionstore05 (bookworm) - [[phab:T370461|T370461]] * 15:15 Southparkfan: add deployment-urldownloader04 (bookworm) - [[phab:T370466|T370466]] * 14:46 Southparkfan: deleted deployment-shellbox (buster) [[phab:T370462|T370462]] * 14:33 Southparkfan: add deployment-shellbox01 [[phab:T370462|T370462]] * 14:19 Southparkfan: err, adding deployment-mwmaint03, I meant - [[phab:T370582|T370582]] * 14:18 Southparkfan: add deployment-maint03, replacing the Buster instance [[phab:T370582|T370582]] == 2024-07-19 == * 16:56 Southparkfan: switching trafficserver backends from mediawiki11 and 12 to 13 and 14 - [[phab:T361387|T361387]] * 15:12 hashar: Switch Jenkins jobs to Quibble 1.9.4 * 14:47 Southparkfan: delete deployment-jobrunner04 (buster), replaced by 05 (bullseye) [[phab:T370487|T370487]] * 14:45 hashar: Tag Quibble 1.9.4 @ {{Gerrit|8d80ad0595b4273b4d0e35db71098f3cb577d4cd}} # [[phab:T370380|T370380]] * 14:38 Southparkfan: remove floating IP for deployment-ircd03 [[phab:T369919|T369919]] * 13:38 hashar: Updating Jenkins jobs for "NPM_CONFIG_cache to lower case" {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1055429 {{!}} [[phab:T370427|T370427]] * 11:03 Southparkfan: switched over cpjobqueue (running on deployment-changeprop-1) to deployment-jobrunner05 [[phab:T370487|T370487]] * 09:28 Southparkfan: create deployment-jobrunner05 with Bullseye image, [[phab:T370487|T370487]] == 2024-07-18 == * 22:49 Southparkfan: deleted deployment-irc02 (buster), released its floating IP, deactivated & cleaned on puppetserver-1, removed irc-next.beta.wmcloud.org A RR - [[phab:T369919|T369919]] * 22:05 Southparkfan: add deployment-ircd03 (bullseye) with floating IP and irc-next.beta.wmcloud.org - [[phab:T369919|T369919]] * 12:56 James_F: Zuul: [mediawiki/extensions/CollaborationKit] Mark as archived, for [[phab:T368092|T368092]] * 12:55 James_F: Zuul: [mediawiki/extensions/DeleteOwn] Mark as archived, for [[phab:T366663|T366663]] * 12:53 James_F: Zuul: [mediawiki/extensions/StickToThatLanguage] Mark as archived, for [[phab:T367670|T367670]] * 12:52 James_F: Zuul: [mediawiki/extensions/PageCreationNotif] Mark as archived, for [[phab:T367673|T367673]] * 09:01 zabe: drop gb_by from globalblocks table in beta cluster * 08:26 hashar: Deleted Buster based integration-agent-pkgbuilder-1001 integration-agent-pkgbuilder-1002 , replaced them with Bullseye based integration-agent-pkgbuilder-1003 and integration-agent-pkgbuilder-1004 # [[phab:T360786|T360786]] == 2024-07-17 == * 16:36 hashar: Created integration-agent-pkgbuilder-1003 to replace EOL Buster instances # [[phab:T360786|T360786]] == 2024-07-16 == * 08:04 hashar: integration: reimaging `integration-cumin` from Buster to Bullseye # [[phab:T360784|T360784]] * 03:04 thcipriani: enable beta-* jobs post beta etcd maintenance * 00:12 bd808: disable beta-code-update-eqiad until etcd maintenance concludes == 2024-07-15 == * 22:21 bd808: disable beta-update-databases-eqiad until etcd maintenance concludes * 21:59 bd808: disable beta-scap-sync-world until etcd maintenance concludes * 09:37 hashar: gerrit: added `ncmonitor` to the Service Users group {{!}} [[phab:T366637|T366637]] == 2024-07-12 == * 23:42 thcipriani: beta deployments now running from deployment-deploy04 (new bullseye host) * 23:20 thcipriani: un-skipping logstash checks in beta * 22:59 thcipriani: skipping logstash checks in beta * 21:38 thcipriani: update beta-* CI jobs, pool deployment-deploy04 in jenkins, offline deployment-deploy03 * 20:12 thcipriani: disable beta-code-update-eqiad/beta-scap-sync-world until server tinkering concludes * 20:02 bd808: Restarted Jenkins agent on deployment-deploy03 * 17:49 thcipriani: reconfigure beta-code-update-eqiad beta-scap-sync-world beta-update-databases-eqiad pending merge of https://gerrit.wikimedia.org/r/1053956 * 13:14 hashar: deployment-prep: fixed https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ which was block due to oojs-ui being updated ( https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1053811 & https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1053816 ) == 2024-07-11 == * 19:03 hashar: gerrit: added EarlyWarningBot to the Service Users group {{!}} [[phab:T323750|T323750]] * 16:52 dancy: Upgrade gitlab-runner to v17.0.1 in gitlab-cloud-runners (staging/production). * 12:35 hashar: Updating wikibase-client job to have it integrated with EntitySchema {{!}} https://gerrit.wikimedia.org/r/1052669 - [[phab:T367156|T367156]] * 12:05 jnuche: reenable beta-code-update-eqiad job in Jenkins CI * 11:37 jnuche: temporarily disable beta-code-update-eqiad job in Jenkins CI to allow for testing of config patch in beta == 2024-07-10 == * 14:20 hashar: Switching Quibble php7.4 jobs to lib ICU 67 # [[phab:T335766|T335766]] [[phab:T345561|T345561]] * 14:00 hashar: Switching some of the Jenkins jobs from Quibble 1.9.1 to 1.9.3 # [[phab:T368783|T368783]] & [[phab:T366799|T366799]] * 13:38 hashar: Building Docker images for Quibble 1.9.3 * 13:08 hashar: Tag Quibble 1.9.3 @ {{Gerrit|0687c7e620887a533feef1942faa564576876f6c}} # [[phab:T366799|T366799]] * 10:39 James_F: Zuul: [mediawiki/services/inference-services]: update article_descriptions src path, for [[phab:T369344|T369344]] * 09:56 James_F: Docker Drop PHP 8.0-based images, unused (no-op) * 09:38 James_F: Zuul: Drop PHP 8.0 jobs == 2024-07-08 == * 15:33 thcipriani: +voice lferreira-wmf (new wmf dx director, say hi :)) == 2024-07-06 == * 15:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc10 * 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc10 * 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc09 * 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc09 * 15:13 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc08 * 15:13 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc08 == 2024-07-05 == * 02:09 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1051621 == 2024-07-03 == * 08:50 hashar: Building Docker images for Quibble 1.9.2 * 08:37 hashar: Tag Quibble 1.9.2 @ {{Gerrit|046ec783998b5964a3148d05a1220fcec8998777}} # [[phab:T361190|T361190]] [[phab:T368783|T368783]] == 2024-07-02 == * 10:54 James_F: jforrester@doc1003:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/Listings/ # [[phab:T354997|T354997]] * 10:47 James_F: Zuul: [mediawiki/extensions/Listings] Mark as archived, for [[phab:T354997|T354997]] == 2024-07-01 == * 17:06 brennen: gitlab: once again setting default squash commit template for all projects, per MR discussion on [[phab:T366624|T366624]] * 11:09 hashar: Restarting Gerrit to apply configuration change {{!}} [[phab:T341291|T341291]] == 2024-06-26 == * 16:54 hashar: integration: fixed castor cache restoration which was broken since mid may # [[phab:T368550|T368550]] * 16:03 hashar: Updating all jobs to switch to releng/castor:0.4.0 and fix cache restoration # [[phab:T368550|T368550]] * 16:01 hashar: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1049976 * 08:54 hashar: gerrit: changed HEAD of operations/software/gerrit from deploy/wmf/stable-3.9 to deploy/wmf/stable-3.10 # [[phab:T367419|T367419]] == 2024-06-25 == * 13:27 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) * 13:14 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs * 12:23 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1047919 : zuul: Enable PHPUnit parallel for all WMDE-maintained repos == 2024-06-24 == * 23:01 bd808: Removed matanya's "reader" right per [[phab:T368330|T368330]] * 16:49 hashar: integration: rearmed keyholder on integration-cumin after the instance got rebooted due to a WMCS maintenance * 16:19 dancy: Upgrading scap to 4.89.0 in beta cluster * 14:42 James_F: Zuul: Enable PHP 8.3 as voting on REL1_42 for all extensions and skins, for [[phab:T353362|T353362]] * 14:37 James_F: Zuul: Enable PHP 8.3 as voting on master for all libraries except two, for [[phab:T353362|T353362]] * 12:08 James_F: Zuul: Enable PHP 8.3 for master branch of MW extensions and skins, for [[phab:T353362|T353362]] * 12:05 James_F: Zuul: Fix composer-test-package-php74-or-later to include PHP 8.3 * 08:39 hashar: releases1003: deleting left over temporary files from the MediaWiki branching (`rm -fR /tmp/mw-branching-*`) {{!}} [[phab:T368239|T368239]] * 07:21 hashar: Switching Jenkins jobs to Quibble 1.9.1 == 2024-06-20 == * 21:44 dancy: Upgraded buildkitd to v0.14.1 in gitlab-cloud-runners staging and production ([[phab:T367352|T367352]]) * 20:26 James_F: Zuul: [mediawiki/extensions/Chart] Install initial CI * 18:44 James_F: Zuul: [mediawiki/extensions/SemanticBundle] Mark as archived, for [[phab:T366277|T366277]] == 2024-06-19 == * 18:49 hashar: Updating docker-pkg files on contint primary to build Quibble 1.9.1 images {{!}} https://gerrit.wikimedia.org/r/1047589 * 18:40 hashar: Tag Quibble 1.9.1 @ {{Gerrit|1464e36eb82c08c4788a08473fcea8aaf57462da}} == 2024-06-18 == * 23:47 James_F: Zuul: [mediawiki/extensions/SocialLogin] Mark as archived, for [[phab:T246273|T246273]] * 23:47 James_F: Zuul: [mediawiki/extensions/SideBarMenu] Mark as archived, for [[phab:T347216|T347216]] * 23:47 James_F: Zuul: [mediawiki/extensions/NewsTicker] Mark as archived, for [[phab:T353928|T353928]] * 23:47 James_F: Zuul: [mediawiki/extensions/NamespaceHTML] Mark as archived, for [[phab:T360235|T360235]] * 23:42 James_F: Zuul: [mediawiki/tools/dependency-analysis] Mark as archived, for [[phab:T350258|T350258]] * 23:38 James_F: Zuul: Move 'in-wikimedia-production' repos to prod section * 23:38 James_F: Zuul: Drop config for wmf branches from -composer templates * 20:53 brennen: gitlab: squash commit templates: update all projects to use %<nowiki>{</nowiki>all_commits<nowiki>}</nowiki> ([[phab:T366624|T366624]]) * 14:40 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1047024 "Make PCC voting on Puppet 7" # [[phab:T367399|T367399]] * 13:21 hashar: deployment-prep: rearmed keyholder on deployment-deploy03 following the unexpected restart of the instance * 10:36 hashar: contint1002: deleted `/srv/zuul/git/operations/software/homer/deploy` and `/srv/zuul/git/operations/software/homer` for topranks due to zuul-merger bug [[phab:T157818|T157818]] * 10:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) * 09:38 taavi: set deployment-db11 as writable after reboot * 09:04 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs == 2024-06-17 == * 21:47 brennen: gitlab: set phorge integration as default instance-wide issue tracker ([[phab:T337570|T337570]]) - this may have knock-on effects * 18:39 James_F: Zuul: [mediawiki/extensions/WikifunctionsClient] Disable selenium * 14:11 Lucas_WMDE: lucaswerkmeister-wmde@deployment-deploy03:~$ mwscript sql testwiki <<< 'DROP TABLE entityschema_id_counter;' # see [[phab:T363153|T363153]] and {{Gerrit|Iaae48b794b}}: table was empty and should not have been created on this wiki in the first place * 12:14 hashar: Reloaded Zuul for "Enable parallel phpunit for WikibaseLexeme" https://gerrit.wikimedia.org/r/1046625 # [[phab:T361190|T361190]] * 12:13 hashar: Reloaded Zuul for "Enable parallel phpunit for WikibaseLexeme" https://gerrit.wikimedia.org/r/1046625 * 08:21 hashar: Updating Jenkins jobs for Quibble 1.9.0 https://gerrit.wikimedia.org/r/1044421 == 2024-06-15 == * 08:48 hashar: Refreshing CI Docker images for Quibble 1.9.0 {{!}} https://gerrit.wikimedia.org/r/1044365 * 08:19 hashar: Tag Quibble 1.9.0 @ {{Gerrit|7b43bbb674eca6f095b9c9b7d886d69840e611e2}} # [[phab:T361190|T361190]] == 2024-06-14 == * 14:01 hashar: gerrit: change fresh submit type from `ff only` to `Rebase if necessary` and enable `Allow content merges` == 2024-06-13 == * 14:50 hashar: gerrit: archived #gerrit3.9 milestone # [[phab:T354887|T354887]] * 14:50 hashar: gerrit: archived #gerrit3.8 milestone # [[phab:T354886|T354886]] * 14:50 hashar: gerrit: archived #gerrit-3.8 milestone # [[phab:T354886|T354886]] * 11:34 hashar: gerrit: reparent mediawiki/tools/cli from mediawiki/tools to All-Archived-Projects ([[phab:T351543|T351543]]) to prevent GitHub replication # [[phab:T333029|T333029]] * 11:31 hashar: gerrit: reparent mediawiki/extensions/WikibaseSchema from mediawiki/extensions to All-Archived-Projects ([[phab:T351543|T351543]]) to prevent GitHub replication # [[phab:T367396|T367396]] == 2024-06-12 == * 18:20 James_F: Zuul: Add CommunityConfiguration to the MediaWiki gated extension set, for [[phab:T366696|T366696]] * 06:47 hashar: gerrit: changed HEAD of operations/software/gerrit from `deploy/wmf/stable-3.8` to `deploy/wmf/stable-3.9` # [[phab:T354887|T354887]] * 06:47 hashar: gerrit: changed HEAD of operations/software/gerrit from `deploy/wmf/stable-3.8` to `deploy/wmf/stable-3.9` # [[phab:T354886|T354886]] == 2024-06-11 == * 21:01 hashar: gerrit: closing stall ssh connections for Matmarex using `ssh -p 29418 hashar@gerrit.wikimedia.org gerrit close-connection` == 2024-06-10 == * 16:00 James_F: Zuul: [mediawiki/extensions/BlueSpiceWikiFarm] Add BlueSpice CI * 15:58 James_F: Zuul: [mediawiki/extensions/CollabPads] Add BlueSpice CI * 15:56 James_F: Zuul: [mediawiki/extensions/AIEditingAssistant] Add BlueSpice CI * 15:53 James_F: Zuul: [mediawiki/extensions/InlineComments] Add Echo as a phan dependency * 15:51 James_F: Zuul: [translatewiki] Change job set to PHP 8.2+ * 15:47 James_F: Zuul: [labs/tools/crosswatch] Archive, for [[phab:T269703|T269703]] * 08:08 hashar: Reloaded Zuul to enable Sonar Codehealth on ApiFeatureUsage, Babel, LiquidThreads and OAuth MediaWiki extensions # [[phab:T321837|T321837]] == 2024-06-07 == * 11:14 pmiazga: proceeding with soft restart deployment-puppetserver-1 * 10:31 pmiazga: deployment-puppetserver-1 - in /srv/git/operations/puppet cherry-picked {{Gerrit|I477c4b72297d8e740461a029e0fd1c7bca818c2f}} to test wikivoyage.beta.wmcloud.org domain handling - [[phab:T355281|T355281]] == 2024-06-06 == * 18:30 pmiazga: [[phab:T355281|T355281]] testing mediawiki-config patch {{Gerrit|Idcd9cdde4e45950916306f676a07f10f4c68e2a8}}, executed `scap sync-world` * 17:56 pmiazga: Executed "mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki pl wikivoyage plwikivoyage pl.wikivoyage.beta.wmcloud.org" to add a new wiki - polish wikivoyage on beta.wmcloud.org domain * 17:49 pmiazga: [[phab:T355281|T355281]] updated DNS zones and hiera configs - added *.m.wikipedia.beta.wmcloud.org, *.wikivoyage.beta.wmcloud.org and *.m.wikivoyage.beta.wmcloud.org domains * 13:35 James_F: Zuul: [mediawiki/extensions/NumberHeadings]: Add initial CI * 13:30 James_F: Zuul: [mediawiki/extensions/TableTools] Remove dependency on VE * 13:26 James_F: Zuul: [mediawiki/extensions/MetricsPlatform]: Add initial CI, for [[phab:T366718|T366718]] == 2024-06-05 == * 17:31 James_F: Zuul: [mediawiki/extensions/WikifunctionsClient] Add initial CI * 11:31 urbanecm: urbanecm@deployment-deploy03:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php frwiktionary growthexperiments # [[phab:T366691|T366691]] * 11:31 urbanecm: urbanecm@deployment-deploy03:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php frwiktionary translate # [[phab:T366691|T366691]] * 11:18 urbanecm: urbanecm@deployment-deploy03:~$ mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiktionary fr wiktionary frwiktionary fr.wiktionary.beta.wmflabs.org # [[phab:T366691|T366691]] == 2024-06-04 == * 22:38 mutante: contint1002 (integration.wikimedia.org) jenkins/zuul - rebooting * 22:12 James_F: Zuul: Tried rolling mediawiki-quibble-composer-mysql-php74 to bullseye to test [[phab:T335766|T335766]], but still failing with "Cannot start ChromeHeadless … Multiple targets are not supported". Reverted. * 22:12 James_F: Zuul: Tried rolling mediawiki-quibble-composer-mysql-php74 to bullseye to test [[phab:T334766|T334766]], but still failing with "Cannot start ChromeHeadless … Multiple targets are not supported". Reverted. * 19:32 mutante: contint2002 - rebooting host * 18:11 dancy: Reenabled beta-code-update-eqiad for tgr * 15:52 dancy: Paused beta-code-update-eqiad for tgr * 13:58 James_F: Zuul: [mediawiki/services/image-suggestion-api] Mark as archived, for [[phab:T366534|T366534]] * 13:32 James_F: Zuul: [mediawiki/extensions/Html2Wiki] Mark as archived, for [[phab:T347236|T347236]] * 13:32 James_F: Zuul: [mediawiki/extensions/UploadBlacklist] Mark as archived, for [[phab:T347126|T347126]] * 13:32 James_F: Zuul: [mediawiki/extensions/PerPageLicense] Mark as archived, for [[phab:T346010|T346010]] * 13:31 James_F: Zuul: [mediawiki/extensions/MultimediaPlayer] Mark as archived, for [[phab:T349154|T349154]] * 13:31 James_F: Zuul: [mediawiki/extensions/MixedNamespaceSearchSuggestions] Mark as archived, for [[phab:T323200|T323200]] * 13:31 James_F: Zuul: [mediawiki/extensions/Interlanguage] Mark as archived, for [[phab:T349953|T349953]] * 13:31 James_F: Zuul: [mediawiki/extensions/DoubleWiki] Mark as archived, for [[phab:T344544|T344544]] * 13:31 James_F: Zuul: [mediawiki/extensions/DebugMode] Mark as archived, for [[phab:T346577|T346577]] * 11:03 pmiazga: added beta.wmcloud.org and *.wikipedia.beta.wmcloud.org definitions to SNI section in deployment-acme-chief and lets-encrypt section in deployment-cache in hiera config on horizon. * 10:34 pmiazga: Live debugging of puppet. Pulled {{Gerrit|Ifd37f0550689bed55c080ded56fd363070a8933f}} to puppetserver-1. Additionally fixed ownership of /srv/git/operations/puppet to `gitpuppet:gitpuppet` to solve problems with git pull. == 2024-06-02 == * away: [[phab:T366415|T366415]] removed upload.beta.wmflabs.org from hiera == 2024-05-30 == * 14:07 hashar: Updating Jenkins jobs for Quibble 1.8.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1037456 * 12:33 hashar: Updating Jenkins jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1037453 * 08:34 hashar: Tag Quibble 1.8.0 @ {{Gerrit|ee3ac324afd8b2935f334979ba73fdcd03be51f6}} # [[phab:T359043|T359043]] * 07:51 hashar: Reloaded Zuul to enable Sonar Codehealth on IPInfo, LabeledSectionTransclusion, MassMessage and Newsletter # [[phab:T321837|T321837]] == 2024-05-29 == * 13:03 brennen: gitlab: created configure-projects-bot admin user for [[phab:T355097|T355097]] == 2024-05-28 == * 13:17 James_F: Docker: [quibble] Set HOME=/tmp so Firefox etc. can work, for [[phab:T365871|T365871]] == 2024-05-24 == * 20:27 Krinkle: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1035812 * 11:36 Krinkle: Promote pmiazga from "reader" to "reader, member" for deployment-prep jn Horizon == 2024-05-23 == * 14:56 James_F: Zuul: Enable 'tarball' extensions already in the main gate * 14:28 James_F: Zuul: [mediawiki/extensions/Wikibase] Run PHP 8.1 jobs by default * 14:18 James_F: Zuul: Fix dependency injection for renamed Wikibase special jobs * 14:17 James_F: Zuul: [mediawiki/extensions/Wikibase] Run REL1_42 on PHP 8.1 jobs for [[phab:T362412|T362412]] * 14:03 James_F: Zuul: Specify PHP flavour in Wikibase special quibble jobs, add 81/83 for [[phab:T362412|T362412]] * 00:29 brennen: adding self as an owner to https://gitlab.wikimedia.org/repos/releng/ == 2024-05-22 == * 23:02 James_F: Zuul: Archive mediawiki/services/service-scaffold-<nowiki>{</nowiki>node,golang<nowiki>}</nowiki> for [[phab:T365512|T365512]] * 07:35 hashar: Jenkins: added back contint2002 as an Agent. The host got reimaged from Buster to Bullseye # [[phab:T334517|T334517]] == 2024-05-20 == * 19:31 dancy: Updated Jenkins mwcore-codehealth-master-non-voting and mwcore-codehealth-patch jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1034138 ([[phab:T365260|T365260]]) * 15:32 elukey: aggregated some pki-related local commits on deployment-puppetserver-1'1 /srv/git/labs/private to unblock git-sync-upstream - [[phab:T360595|T360595]] == 2024-05-17 == * 18:37 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1023086 * 14:14 James_F: Zuul: [mediawiki/extensions/CommunityRequests] basic CI * 13:19 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1032443 * 13:15 James_F: Zuul: Revert "JSDoc: add 5 extensions/skins" * 12:47 James_F: Zuul: Add Anterdc99 to CI allow list * 07:48 hashar: zuul: deleted /srv/zuul/git/operations/software due to namespace clash {{!}} [[phab:T157818|T157818]] == 2024-05-16 == * 17:52 brennen: running a no-op test deployment to phab2002 to reproduce errors for [[phab:T313624|T313624]] * 15:33 dancy: Updated Jenkins doxygen-publish job for https://gerrit.wikimedia.org/r/c/integration/config/+/1032515 ([[phab:T282893|T282893]]) * 15:15 dancy: Updating many Jenkins jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1028900 ([[phab:T282893|T282893]]) * 13:01 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add ULS as a dependency == 2024-05-15 == * 14:31 hashar: deployment-prep: sudo cumin --force '*' 'rm -f /etc/apt/sources.list.d/openstack-zed*' * 14:30 hashar: deployment-prep: armed keyholder on deployment-cumin (credentials are in the deployment-puppetserver-1 private git repo) == 2024-05-14 == * 17:51 taavi: reload zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1031519 * 08:09 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1025313 == 2024-05-13 == * 17:12 dancy: Ran `docker buildx prune --keep-storage 20000000000` on integration-agent-docker-1054 to free up ~19GB in /var/lib/docker. * 17:05 dancy: Updating jenkins jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1029621 * 15:18 hashar: deployment-prep: deleted security rule for 208.80.154.17 ssh and port 2 (sic) and allow 208.80.154.132 / contint1002 port 22 instead # [[phab:T334517|T334517]] * 14:46 James_F: Zuul: [mediawiki/extensions/DiscordRCFeed] Add Flow as a phan dep too * 13:48 hashar: Deleted Jenkins builds for the deleted jobs that had a `-docker` suffix {{!}} [[phab:T360327|T360327]] == 2024-05-10 == * 10:37 Amir1: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1030054 == 2024-05-07 == * 13:58 James_F: Zuul: [ooui] Drop PHP 8.0 testing * 13:58 James_F: Zuul: [wikipeg] Drop PHP 8.0 testing * 13:58 James_F: Zuul: [mediawiki/libs/parsoid] Drop PHP 8.0 testing * 13:46 James_F: Zuul: Drop PHP 8.0 testing from main branch * 13:43 James_F: Zuul: [mediawiki/extensions/ContainerFilter] Disable selenium tests * 13:41 James_F: Zuul: Create a bespoke template for BlueSpice extensions * 07:09 James_F: Zuul: [mediawiki/ruby/api] Drop ruby2.5 jobs == 2024-05-06 == * 20:29 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1028596 * 19:50 Reedy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1028587 * 19:01 dancy: Rebooting deployment-mediawiki12.deployment-prep to verify that it is reboot safe ([[phab:T356692|T356692]]) * 10:52 jnuche: Updated plugins in https://integration.wikimedia.org/ci to apply security fixes == 2024-05-05 == * 11:40 James_F: Zuul: Add Piracalamina to CI allowlist * 08:16 brennen: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1027516 * 07:19 taavi: reload zuul to add sam walton to allowlist * 06:25 James_F: Zuul: Add ConfirmEdit to phan dependencies for AbuseFilter, for [[phab:T20110|T20110]] == 2024-05-04 == * 18:12 Amir1: Reloading Zuul to deploy gerrit:1027233 * 14:36 taavi: reload zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1027183 * 12:59 taavi: releases-jenkins: use read-only LDAP servers * 12:57 James_F: Add taavi to releases-jenkins admins to try out fixing LDAP config * 11:52 brennen: phabricator 2fa reset for user Krabina == 2024-05-03 == * 14:12 James_F: Zuul: As an emergency, drop CI for BlueSpice extensions and skins * 11:09 brennen: created logspam-watch tag: https://phabricator.wikimedia.org/project/profile/7157/ == 2024-05-02 == * 15:56 dancy: Upgrading buildkit to v13.2 in gitlab-cloud-runners ([[phab:T364013|T364013]]) == 2024-04-30 == * 08:53 James_F: Zuul: Add XtexChooser to CI allowlist * 08:40 James_F: zuul: [mediawiki/extensions/CampaignEvents] Add WikimediaCampaignEvents dependency * 08:40 James_F: zuul: [mediawiki/extensions/ArticleFeedbackv5] Add SpamRegex as an AFTv5 phan dependency == 2024-04-29 == * 18:28 James_F: Zuul: [VisualEditor/VisualEditor] Publish docs on postmerge, like coverage * 16:13 James_F: Zuul: [mediawiki/extensions/BlueSpicePagesVisited] Add BlueSpiceDistributionConnector as a dependency == 2024-04-28 == * 19:04 James_F: Zuul: Provide Node 22 experimental jobs everywhere for [[phab:T363653|T363653]] * 18:38 James_F: Zuul: [wikimedia/slimapp] Upgrade PHP 8.2 CI to voting == 2024-04-26 == * 18:55 James_F: Zuul: [mediawiki/extensions/Math] Re-enable PHP 8.2 CI voting for [[phab:T360709|T360709]] * 18:18 James_F: Zuul: Re-apply PHP 8.2 CI to Wikibase-based code for [[phab:T360560|T360560]] [[phab:T324202|T324202]] [[phab:T353161|T353161]] == 2024-04-24 == * 14:54 hashar: Updating Jenkins jobs for buster-backports removal {{!}} [[phab:T362518|T362518]] {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1023870 * 13:45 hashar: Building Buster CI images to get rid of `buster-backports` {{!}} [[phab:T362518|T362518]] {{!}} https://gerrit.wikimedia.org/r/1023853 * 08:44 hashar: Reparented all BlueSpice extensions using `gerrit set-project-parent <projects> --parent mediawiki/extensions/BlueSpice` * 08:39 hashar: gerrit: created https://gerrit.wikimedia.org/r/admin/repos/mediawiki/extensions/BlueSpice to reparent BlueSpice repositories which will ease permissions management == 2024-04-23 == * 15:09 taavi: reload zuul for {{Gerrit|1023416}} * 11:09 James_F: Zuul: Drop skin-quibble-php74-or-later template, unused (no-op) * 09:04 hashar: Reloaded Zuul to enable Sonar Codehealth on TheWikipediaLibrary, timeline, WebAuthn and WikidataPageBanner # [[phab:T321837|T321837]] == 2024-04-22 == * 16:54 taavi: reload zuul for {{Gerrit|1023099}} * 08:39 taavi: reload zuul for {{Gerrit|1023047}} == 2024-04-21 == * 17:34 taavi: reload zuul for {{Gerrit|1022596}} == 2024-04-20 == * 18:04 taavi: gerrit: allow LabelBots (including LibUp) to vote V-1..+1 on mediawiki/libs/*, which overrides the global V label permissions == 2024-04-19 == * 15:06 hashar: gerrit: on https://gerrit.wikimedia.org/g/design/codex , deleted tag `1.3.0` (was same as v1.3.0). Requested by Volker == 2024-04-16 == * 20:48 James_F: Zuul: Switch performance/fresnel, unicodejs, and VisualEditor/VisualEditor to generic node jobs. * 15:57 James_F: Docker: Upgrade node18 and node20 images to latest * 15:30 hashar: devtools: deleted obsolete Buster instances `gerrit-bullseye-test` and `gerrit-prod-1001` # [[phab:T360964|T360964]] * 14:41 James_F: Zuul: [mediawiki/extensions/Wikibase] Drop quibble-vendor-mysql-php74-noselenium experimental job, already mainstream * 13:21 James_F: Zuul: [mediawiki/extensions/Wikibase] Disable custom jobs on REL1_42 temporarily for [[phab:T362412|T362412]] * 11:32 hashar: Released Collapsible Console Section Jenkins plugin 1.9.0 # [[phab:T362048|T362048]] * 11:15 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1019774 # [[phab:T362598|T362598]] * 09:12 taavi: reloading zuul for {{Gerrit|1019850}} == 2024-04-15 == * 16:17 hashar: Changed operations/software/gerrit HEAD to `deploy/wmf/stable-3.8` # [[phab:T354886|T354886]] == 2024-04-12 == * 18:47 James_F: Zuul: [wikipeg] Test special in Node 20 with PHP 8.0–8.3, all passing * 18:46 James_F: Zuul: [oojs/ui] Test in PHP 8.2 and 8.3, and Node 20; all passing * 18:17 James_F: Docker: Provide node20-test-browser-php81-composer and node20-test-browser-php83-composer * 18:10 James_F: Zuul: [oojs/ui] Switch to newly-parameterised jobs * 17:40 James_F: Docker: [doxygen] Upgrade to Doxygen v1.10.0 == 2024-04-11 == * 22:39 brennen: created vps-project-devtools tag: https://phabricator.wikimedia.org/tag/vps-project-devtools/ == 2024-04-10 == * 20:51 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add EventLogging phan dep * 18:20 jeena: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/64 * 05:14 hashar: Restarted CI Jenkins to upgrade "collapsible console sections" plugin to a snapshot version having https://github.com/jenkinsci/collapsing-console-sections-plugin/pull/26 # [[phab:T362048|T362048]] * 04:27 hashar: Reloaded Zuul to enable Sonar Codehealth on Cognate, EventBus, Graph, Interwiki and intersection # [[phab:T321837|T321837]] == 2024-04-08 == * 16:47 James_F: Zuul: [mediawiki/extensions/Wikibase] Drop extension-coverage coverage, not yet ready * 14:16 James_F: Zuul: [mediawiki/core] Make PHP 8.2 and 8.3 voting for REL1_39–41 [[phab:T361985|T361985]] * 08:57 hashar: Stopped Docker on all agents, manually deleted `/srv/jenkins/workspace` which accumulated leftover bits over time, restarted CI Jenkins * 08:17 hashar: Remove obsolete jobs containing `docker` in their name (see https://phabricator.wikimedia.org/P59726) and restarting CI Jenkins{{!}} [[phab:T360327|T360327]] * 08:14 hashar: Deleting obsolete caches following renaming of Jenkins jobs to drop the `docker` suffix {{!}} [[phab:T360327|T360327]] == 2024-04-05 == * 22:05 James_F: Zuul: [mediawiki/core] Drop experimental mediawiki-core-php80-phan job * 21:52 James_F: Zuul: Inject phan's phan_dependencies even if '-phan' is terminal * 21:35 James_F: Zuul: Rename all jobs to drop '-docker' label for [[phab:T360327|T360327]] * 20:06 James_F: Zuul: [mediawiki/extensions/MultimediaViewer] Make MobileFrontend a phan not test dependency * 19:45 James_F: jforrester@doc1003:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MachineVision/ # [[phab:T352884|T352884]] * 19:39 James_F: Zuul: [mediawiki/extensions/MachineVision] Mark as archived for [[phab:T352884|T352884]] * 14:22 James_F: Docker: Building new images with composer 2.7.2 for [[phab:T360973|T360973]] * 13:39 hashar: Upgrading scap to latest code revision in beta cluster * 13:13 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add MobileFrontend phan dependency * 13:07 James_F: Zuul: Remove Parsoid dependency on Poem for [[phab:T358054|T358054]] == 2024-04-04 == * 17:02 pmiazga: deployment-prep [[phab:T355281|T355281]] executed “mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki --skipclusters=main,echo,growth,mediamoderation,extstore en wikipedia test2wiki test2.wikipedia.beta.wmcloud.org” on deployment-deploy03. * 16:00 dancy: Updating docker-pkg files on contint primary for [[phab:T328472|T328472]] * 14:59 dancy: Updating docker-pkg files on contint primary for [[phab:T328472|T328472]] == 2024-04-03 == * 13:46 James_F: Zuul: [mediawiki/skins/2018] Add basic CI * 13:08 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1016761 https://gerrit.wikimedia.org/r/1016762 https://gerrit.wikimedia.org/r/1016765 == 2024-04-02 == * 14:39 elukey: restart puppetserver on deployment-puppetserver-1 with 5g of Xmx (rather than 7g) - [[phab:T360595|T360595]] * 13:55 elukey: soft reboot deployment-puppetserver-1 - [[phab:T360595|T360595]] * 13:45 elukey: `rm /srv/git/labs/private/.git/hooks/pre-commit` in deployment-puppetserver-1 - [[phab:T360595|T360595]] * 11:51 pmiazga: On deployment-deploy03.deployment-prep executed “mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki --skipclusters=main,echo,growth,mediamoderation en wikipedia test2wiki test2.wikipedia.beta.wmcloud.org” * 09:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1016303 == 2024-03-30 == * 22:03 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1015597, [[phab:T361412|T361412]] * 22:02 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1015632, [[phab:T331817|T331817]] * 22:02 Krinkle: Reloading Zuul to deploy {{Gerrit|Ic3d0d473270}}, [[phab:T249949|T249949]] == 2024-03-29 == * 12:01 hashar: Archived integration/composer # [[phab:T249949|T249949]] == 2024-03-28 == * 22:27 tgr: added toyofuku to deployment-prep * 20:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1015079 * 20:28 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1015361 * 13:07 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1015322 == 2024-03-26 == * 22:55 brennen: unclear if stashbot is working ([[phab:T361073|T361073]]) * 09:47 hashar: Upgrading Quibble to 1.7.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1014043 == 2024-03-25 == * 16:05 hashar: Tag Quibble 1.7.0 @ {{Gerrit|6bbe555d1006e925d5cf760da8f7c6e9374a31bc}} # [[phab:T236222|T236222]] [[phab:T218647|T218647]] [[phab:T360443|T360443]] [[phab:T354141|T354141]] [[phab:T356247|T356247]] [[phab:T357070|T357070]] * 11:44 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Disable PHP 8.2 testing due to Wikibase for [[phab:T360560|T360560]] * 08:09 hashar: gerrit: Deleted operations/debs/wikibugs which was empty and never used # [[phab:T343707|T343707]] == 2024-03-23 == * 16:37 hashar: Reloaded Zuul for "Add dependencies of Femiwiki Teams's extensions" {{!}} https://gerrit.wikimedia.org/r/1013666 {{!}} [[phab:T358970|T358970]] == 2024-03-21 == * 20:28 James_F: Zuul: [Wikibase*] Disable PHP 8.2 testing for more Wikibase repos, for [[phab:T360560|T360560]] * 20:20 James_F: Zuul: [mediawiki/extensions/Math] Disable PHP 8.2 testing for now, for [[phab:T360709|T360709]] * 18:30 James_F: Zuul: Also make PHP 8.2 non-voting for WikibaseLexeme * 14:25 hashar: github: deleted archived repos wikimedia/operations-software-matterircd and wikimedia/operations-software-mattermost # [[phab:T343707|T343707]] * 11:12 James_F: Zuul: Make PHP 8.2 voting for all(*) MediaWiki extensions and skins for [[phab:T352085|T352085]] * 09:51 hashar: Reloaded Zuul for "ml-services: fix huggingface pipeline triggers" https://gerrit.wikimedia.org/r/1013161 {{!}} [[phab:T357986|T357986]] * 09:50 hashar: Reloaded Zuul to enable Sonar Codehealth on EventLogging, MediaSearch, TranslationNotifications and WikiLove # [[phab:T321837|T321837]] * 08:36 hashar: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ # [[phab:T360595|T360595]] == 2024-03-20 == * 16:29 hashar: integration: sudo cumin --force 'name:docker' 'docker buildx prune --force' == 2024-03-19 == * 13:42 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1011303 {{!}} [[phab:T357986|T357986]] * 13:08 hashar: Deleted `language-screenshots-VisualEditor` Jenkins job # [[phab:T360425|T360425]] * 10:57 hashar: Updating Docker images for "dockerfiles: Quibble php-fpm memory_limit to 256M" {{!}} https://gerrit.wikimedia.org/r/1012616 {{!}} [[phab:T356402|T356402]] == 2024-03-18 == * 21:14 James_F: Zuul: [mediawiki/tools/codesniffer] Test in PHP 8.2 and 8.3, now phan is upgraded * 15:02 James_F: Configure REL1_42 branch support, with PHP 8.1+ support only for [[phab:T359838|T359838]] and [[phab:T359868|T359868]] == 2024-03-14 == * 22:14 dancy: Reenabled beta-scap-sync-world * 21:47 dancy: Disabled beta-scap-sync-world job until ssh_known_hosts issues are resolved (see #wikimedia-cloud) * 21:31 andrewbogott: shutting down deployment-puppetdb03, deployment-puppetdb04, deployment-puppetmaster04. These have been replaced with new puppet infra and can be deleted in a couple of weeks if all is well. * 17:18 taavi: reloading zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1011150 * 17:16 taavi: delete debian-glue-buster, debian-glue-buster-non-voting Jenkins jobs * 14:58 pmiazga: on deployment-deploy03 execution “mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en wikipedia test2wiki test2.wikipedia.beta.wmcloud.org” failed with `Query::isWriteQuery called with incorrect flags parameter’ [[phab:T355281|T355281]] * 14:39 pmiazga: on deployment-deploy03 execution “mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki --skipclusters=main,extstore,echo,growth,mediamoderation en wikipedia test2wiki test2.wikipedia.beta.wmcloud.org” failed with `Unknown database test2wiki’ [[phab:T355281|T355281]] == 2024-03-13 == * 22:06 Krinkle: krinkle@deployment-puppetmaster04 Remove live hack from 2021 <https://gerrit.wikimedia.org/r/941479>, [[phab:T357877|T357877]] * 17:32 James_F: Zuul: Make 'standalone' jobs run consecutively, not concurrently * 13:04 James_F: Zuul: Archive some Toolforge projects for [[phab:T359935|T359935]] * 08:31 hashar: gerrit: reactivated account `Conniecc1` using `gerrit set-account 7450 --active` {{!}} [[phab:T360006|T360006]] == 2024-03-12 == * 15:00 hashar: integration: clearing Docker build cache on all Jenkins agents due to [[phab:T338317|T338317]] {{!}} sudo cumin --force 'name:docker' 'docker buildx prune --force' * 14:37 James_F: Zuul: Add Nemoralis to CI allow list * 14:31 James_F: Zuul: [mediawiki/extensions/IPReputation] Add prod template, comment * 14:18 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1010500 * 06:26 hashar: Reloaded Zuul to enable Sonar Codehealth on BounceHander, CommunityConfiguration, GlobalCssJs, WikibaseCirrusSearch - [[phab:T321837|T321837]] == 2024-03-11 == * 20:15 James_F: Zuul: Drop PHP 8.0 jobs from standalone special job * 17:38 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1009218 * 17:00 James_F: Zuul: [mediawiki/extensions/StructuredNavigation] Drop bespoke template * 16:59 James_F: Zuul: Change extension-broken to lint on PHP 8.1 == 2024-03-10 == * 17:19 James_F: Zuul: [mediawiki/extensions/AchievementBadges,FacetedCategory,Sanctions] Basic CI, for [[phab:T358970|T358970]] == 2024-03-08 == * 21:32 James_F: Added ecarg (WMF staff member) as a member of deployment-prep so they can help with Wikifunctions Beta Cluster maintenance. * 18:13 sandeep: Starting gitlab cloud runners k8s cluster upgrade ([[phab:T359594|T359594]]) == 2024-03-07 == * 17:32 dancy: Upgrading scap to latest code revision in beta cluster == 2024-03-03 == * 15:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1008014 == 2024-03-02 == * 23:24 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1007742 https://gerrit.wikimedia.org/r/1007949 == 2024-03-01 == * 01:03 bd808: Added RLazarus as a project member * 00:06 dancy: dancy@deployment-puppetmaster04.deployment-prep.eqiad.wmflabs Blocked a beta wikis abuser in varnish ([[phab:T358817|T358817]]) == 2024-02-28 == * 22:23 dancy: jjb-update beta-code-update-eqiad * 16:26 dancy: Upgrading scap to latest code revision in beta cluster * 14:17 James_F: Zuul: [labs/tools/wikibugs2] Mark as archived for [[phab:T358630|T358630]] * 14:10 James_F: Zuul: [mediawiki/extensions/Cite] Publish JS docs post-merge for [[phab:T358641|T358641]] * 14:10 James_F: Zuul: [mediawiki/extensions/MagicNumberedHeadings] Mark as archived for [[phab:T353966|T353966]] * 14:02 James_F: Zuul: [mediawiki/extensions/ImageCompare] Mark as archived for [[phab:T353968|T353968]] * 13:53 James_F: Zuul: [wikimedia/iegreview] Mark as archived for [[phab:T351889|T351889]] == 2024-02-27 == * 20:32 James_F: Disabling deployment-db12 and deployment-db13 as follow-up to [[phab:T358329|T358329]]; no longer used * 14:57 hashar: Remove mediawiki-selenium rubygems documentation since is no more used / published and has been archived # [[phab:T242293|T242293]] * 13:54 James_F: Zuul: [mediawiki/extensions/GlobalBlocking] Enable 'extension-codehealth' for [[phab:T321837|T321837]] == 2024-02-26 == * 22:04 James_F: Deleting deployment-db09, decommissioned 11 months ago but never deleted in [[phab:T331019|T331019]] * 17:29 apergos: waiting on Amir's input as per his comment on [[phab:T358329|T358329]]; db14 remains up but not replicating. * 16:38 TheresNoTime: deployment-prep, db11, `mariabackup --innobackupex --apply-log --use-memory=10G /srv/sqldata` [[phab:T358329|T358329]] * 15:36 TheresNoTime: deployment-prep db11 set `open-files-limit = 4094` in `/etc/my.cnf`, then did `root@deployment-db11:~# systemctl restart mysqld` * 15:35 TheresNoTime: deployment-prep db11 `root@BETA[(none)]> drop database test2wiki;` * 15:11 TheresNoTime: deployment-prep, db11, `root@BETA[(none)]> set global read_only = false;` while I figure out mariabackup error [[phab:T358329|T358329]] * 15:10 TheresNoTime: deployment-prep prev. command resulted in `2024-02-26 15:06:43 0 [ERROR] InnoDB: Operating system error number 24 in a file operation.` [[phab:T358329|T358329]] * 15:09 TheresNoTime: deployment-prep `root@deployment-db11:~# mariabackup --innobackupex --stream=xbstream /srv/sqldata --user=root --host=127.0.0.1 --slave-info {{!}} nc deployment-db14 9210` [[phab:T358329|T358329]] * 15:05 TheresNoTime: deployment-prep, db11, `root@BETA[(none)]> set global read_only = true;` [[phab:T358329|T358329]] * 14:59 TheresNoTime: deployment-prep, `root@deployment-db14:~# /opt/wmf-mariadb106/scripts/mysql_install_db --user=mysql --basedir=/opt/wmf-mariadb106 --datadir=/srv/sqldata` [[phab:T358329|T358329]] * 14:35 TheresNoTime: deployment-prep, starting the creation of `deployment-db14` per https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Databases for [[phab:T358329|T358329]] == 2024-02-23 == * 16:26 pmiazga: executing mwscript update.php --wiki=aawiki --quick --skip-config-validation to check if this is going to timeout as it timeouts in `beta-update-databases-eqiad` Jenkins job * 12:30 jnuche: temporarily disabled https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ until it can be troubleshooted == 2024-02-22 == * 16:36 hashar: Reloaded Zuul to enable Sonar Codehealth on ContactPage, Disambiguator, FlaggedRevs and ReadingLists # [[phab:T321837|T321837]] * 16:15 dancy: gitlab-cloud-runners upgraded to v16.7.1 == 2024-02-21 == * 18:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1005564 == 2024-02-20 == * 21:07 James_F: Zuul: Remove Parsoid dependency on Cite, Disambiguator, and ImageMap, for [[phab:T354215|T354215]] == 2024-02-19 == * 15:28 jnuche: reload zuul for https://gerrit.wikimedia.org/r/1003437 * 10:40 hashar: Updated Jenkins jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/1004622 * 10:25 hashar: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1004245 * 10:06 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/1004099 # [[phab:T355194|T355194]] * 02:57 TimStarling: on deployment-deploy03 running foreachwiki maintenance/migrateBlocks.php again ([[phab:T355034|T355034]], [[phab:T357366|T357366]]) * 02:54 TimStarling: on deployment-deploy03 truncate tables block and block_target on all beta wikis [[phab:T357366|T357366]] == 2024-02-14 == * 14:50 James_F: Zuul: [mediawiki/vendor] Test composer lint with PHP 8.0+ too for [[phab:T300463|T300463]] [[phab:T316078|T316078]] [[phab:T352085|T352085]] [[phab:T353362|T353362]] == 2024-02-13 == * 21:50 brennen: add kostajh to acl*phabricator per [[phab:T355405|T355405]] == 2024-02-12 == * 19:57 dancy: Attempting to reboot deployment-parsoid12.deployment-prep * 19:53 dancy: Updating development images on contint primary for [[phab:T349201|T349201]] * 15:06 James_F: Zuul: [mediawiki/extensions/VisualData] Enable basic quibble CI * 15:01 James_F: Add BlueSpiceVisualEditorConnector as dependency to BlueSpiceQrCode * 14:56 James_F: Zuul: [mediawiki/extensions/PlaceNewSection] Place in right section == 2024-02-09 == * 15:05 James_F: Zuul: Do not test patches having CR+2 for [[phab:T357080|T357080]] * 12:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/999701 * 12:43 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/999694 == 2024-02-08 == * 22:41 TimStarling: on deployment-deploy03 running foreachwiki maintenance/migrateBlocks.php for [[phab:T355034|T355034]] * 20:32 James_F: Zuul: [mediawiki/tools/phan] Enable PHP 8.2 & 8.3 jobs == 2024-02-07 == * 15:15 James_F: Zuul: [mediawiki/extensions/AutoModerator] Add prod template * 14:58 taavi: reload zuul for https://gerrit.wikimedia.org/r/998386 == 2024-02-06 == * 20:42 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/997895 * 14:57 hashar: Reloaded Zuul to enable Sonar Codehealth on CentralNotice, Citoid, cldr and TemplateWizard # [[phab:T321837|T321837]] == 2024-02-05 == * 19:34 dancy: Disable https://integration.wikimedia.org/ci/job/beta-scap-sync-world/ while dealing with storage space issues on deployment-mediawiki12.deployment-prep.eqiad1.wikimedia.cloud * 19:04 James_F: jforrester@deployment-mediawiki12:/srv/mediawiki/php-master/cache/l10n/upstream$ sudo rm -rf .~tmp~/ # Clean up disk usage spotted by Dancy == 2024-02-04 == * 00:03 James_F: [18:54:23] <+wikibugs> (PS1) Jforrester: Zuul: [mediawiki/extensions/DiscordRCFeed] Install basic CI == 2024-02-03 == * 22:08 James_F: sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/GettingStarted/ # [[phab:T292654|T292654]] == 2024-02-02 == * 21:03 James_F: Zuul: [mediawiki/extensions/ReadingLists] Enable Mocha tests for [[phab:T355648|T355648]] == 2024-02-01 == * 17:19 James_F: Revert "Zuul: [mediawiki/extensions/WikiLambda] Run selenium tests again * 15:29 James_F: Zuul: [mediawiki/extensions/VisualEditor] Re-add DiscussionTools dependency CC [[phab:T319202|T319202]] * 15:24 James_F: Zuul: [mw/ext/BlueSpiceBookshelf] Add dependency on MenuEditor == 2024-01-31 == * 17:48 MatmaRex: (output: https://phabricator.wikimedia.org/P54503#226547) * 17:45 MatmaRex: matmarex@deployment-mwmaint02:~$ mwscript DiscussionTools:persistRevisionThreadItems --wiki=metawiki --all --current * 17:44 MatmaRex: matmarex@deployment-mwmaint02:~$ mwscript DiscussionTools:persistRevisionThreadItems --wiki=enwiktionary --all --current * 13:48 urbanecm: deployment-prep: Create `wikimedia_campaign_events_grant` DB table in wikishared ([[phab:T348444|T348444]]) * 12:02 hashar: Force updating tags in the CI git mirrors for [[phab:T356247|T356247]]: sudo cumin --force 'name:docker' "/usr/bin/find /srv/git -type d -name '*.git' -exec git -C <nowiki>{</nowiki><nowiki>}</nowiki> fetch origin --prune --prune-tags --force \;" == 2024-01-30 == * 17:40 brennen: phab1004: testing public_task_dump.py tweaks, take 3 * 10:16 jnuche: CI Jenkins update to 2.426.3 complete * 10:12 jnuche: updating CI Jenkins version to 2.426.3 to address vulnerabilities == 2024-01-29 == * 23:05 brennen: phab1004: testing public_task_dump.py tweaks, take 2 (write dump to /home/brennen to avoid any potential non-public task leaks if logic is wrong) * 21:59 brennen: phab1004: testing public_task_dump.py tweaks * 21:10 James_F: Zuul: Add selenium experimental job to extension-quibble-noselenium * 21:01 dancy: Created gitlab-mentions-bot account on GitLab ([[phab:T289712|T289712]]) == 2024-01-27 == * 21:35 taavi: reload zuul for 993234 == 2024-01-26 == * 13:44 hashar: gerrit: deleting obsolete references in All-Users.git under refs/starred-changes/* to only keep the one having a `star` label # [[phab:T355794|T355794]] == 2024-01-25 == * 20:39 hashar: Reloaded Zuul to enable Sonar Codehealth on CodeMirror, GuidedTour, Wikisource and XAnalytics # [[phab:T321837|T321837]] * 19:17 James_F: Docker: [quibble*] Re-install php-ldap, mistakenly dropped * 16:32 James_F: Docker: Install php-ldap in top-level images for simplicity == 2024-01-22 == * 11:05 hashar: Branched stable-3.7 on upstream https://gerrit.googlesource.com/plugins/zuul # [[phab:T355521|T355521]] * 08:47 hashar: integration-castor05: `sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwgate-node18-docker` # [[phab:T295351|T295351]] == 2024-01-18 == * 22:16 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/991635 * 21:40 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=dewiki --delete --old '' --fromuserid=52 --nowarn 'echo-subscriptions-web-reverted'` ([[phab:T353225|T353225]]) * 21:39 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=dewiki --delete --old 1 --fromuserid=52 --nowarn 'echo-subscriptions-email-article-linked'` ([[phab:T353225|T353225]]) * 21:39 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=dewiki --delete --old 1 --fromuserid=52 --nowarn 'echo-subscriptions-email-mention'` ([[phab:T353225|T353225]]) * 21:37 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=dewiki --delete --old 1 --fromuserid=52 --nowarn 'echo-subscriptions-web-article-linked'` ([[phab:T353225|T353225]]) * 21:32 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=enwiki --delete --old 1 --fromuserid=906 --nowarn 'echo-subscriptions-email-article-linked'` ([[phab:T353225|T353225]]) * 21:31 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=enwiki --delete --old 1 --fromuserid=906 --nowarn 'echo-subscriptions-email-mention'` ([[phab:T353225|T353225]]) * 21:30 urbanecm: deployment-prep: `urbanecm@deployment-mwmaint02:~$ mwscript userOptions.php --wiki=enwiki --delete --old 1 --fromuserid=906 --nowarn 'echo-subscriptions-web-article-linked'` ([[phab:T353225|T353225]]) * 20:58 James_F: Zuul: Drop rust-coverage-publish job usage, broken (for [[phab:T354594|T354594]]) * 17:23 James_F: Zuul: [mediawiki/extensions/LocalisationUpdate] Archive for [[phab:T309694|T309694]] * 17:23 James_F: jforrester@doc1003:/srv/doc/cover-extensions$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/LocalisationUpdate/ # [[phab:T309694|T309694]] * 15:32 hashar: Archived integration/utils-rs {{!}} [[phab:T354594|T354594]] * 11:54 urbanecm: deployment-prep: `mwscript userOptions.php --wiki=enwiki --delete --old '' --fromuserid=906 --nowarn 'echo-subscriptions-web-reverted'` ([[phab:T353225|T353225]]) == 2024-01-17 == * 12:25 urbanecm: deployment-prep: `foreachwiki userOptions.php --old-is-default --old=2 --new 1 --nowarn echo-subscriptions-web-reverted`([[phab:T353225|T353225]]) * 12:02 TheresNoTime: soft-rebooted deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud [[phab:T355218|T355218]] == 2024-01-16 == * 16:03 dancy: Updated gitlab-runner to 16.5.0 in gitlab-cloud-runners. * 14:56 James_F: Zuul: [mediawiki/extensions/ConfirmEdit] Run mwgate-tox-docker for [[phab:T355090|T355090]] == 2024-01-15 == * 15:25 hashar: gerrit: archived All-Avatars.git # [[phab:T191183|T191183]] == 2024-01-12 == * 22:35 RhinosF1: that's on phab * 22:35 RhinosF1: disabled cchen # [[phab:T354961|T354961]] - contact T&S for info * 22:34 James_F: Zuul: [mediawiki/extensions/NetworkSession] Add basic CI for [[phab:T354976|T354976]] * 15:18 James_F: Docker: Publishing Quibble images using Node 18 not 16 for [[phab:T331180|T331180]] == 2024-01-11 == * 15:39 James_F: Zuul: Mass-migrate all node16 jobs to node18 for [[phab:T331180|T331180]] * 14:43 James_F: Zuul: [performance/excimer-ui-server] Drop PHP 7.3 testing == 2024-01-10 == * 20:50 brennen: gitlab access: added joal as owner to /repos/ci-tools/wmf-jvm-parent-pom * 16:12 hashar: Building Docker images for https://gerrit.wikimedia.org/r/983247 == 2024-01-09 == * 15:04 James_F: zuul: [mediawiki/extensions/WikimediaCampaignEvents] Promote to Wikimedia prod section for [[phab:T347894|T347894]] * 13:42 James_F: Zuul: [mediawiki/skins/Vector] Drop documentation job * 08:53 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/984222 {{!}} [mediawiki/extensions/Wikibase] Disable Phan on release branches {{!}} [[phab:T226945|T226945]] [[phab:T231966|T231966]] == 2024-01-08 == * 16:59 James_F: Docker: [php-ast] Install php-ast v1.1.1 and cascade * 15:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/988653 # [[phab:T354534|T354534]] * 15:47 hashar: phabricator: archiving diffusions repos rAWWO rAWWS rAWWB rAWWD rAWWU rAWCM rAWWV rAWWH rAWWJ rWDGD rAWWT # [[phab:T354534|T354534]] * 15:34 hashar: gerrit: arrchived analytics/wmde/WDCM* repositories # [[phab:T354534|T354534]] * 14:19 James_F: Docker: Publishing new CI images with git safe.directory disabled for [[phab:T354409|T354409]] * 09:22 hashar: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwgate-node16-docker/ == 2024-01-05 == * 23:07 thcipriani: reloading zuul to deploy https://gerrit.wikimedia.org/r/988109 * 21:15 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/988080 * 10:40 hashar: Rollback Quibble Jenkins jobs update due to broken `git` introduced by https://gerrit.wikimedia.org/r/c/integration/config/+/983892 * 10:38 hashar: Updated Quibble Jenkins jobs to latest images ( https://gerrit.wikimedia.org/r/c/integration/config/+/987790 and https://gerrit.wikimedia.org/r/c/integration/config/+/983892 ) == 2024-01-01 == * 15:41 taavi: reloading zuul for 986673 {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 50uisdoj2x1rwqcbdb06frqa3evp47f MariaDB/Backups 0 23732 2243254 2146633 2024-11-11T11:28:00Z JCrespo (WMF) 5427 /* Recovering the data (Mariadb 10.6) */ format fix 2243254 wikitext text/x-wiki {{Navigation Wikimedia infrastructure|expand=db}} [[File:Backing_up_Wikipedia_Databases.pdf|thumb|Presentation summary]] == Backup and recovery methods == The following types of backups are part of the design: * Bi-daily compressed binary backups (snapshots) for disaster recovery (1 month retention) * Weekly logical backups for long term recovery (3 month retention) * Binlog backups (for point in time recovery/incremental-ish backups) [Implementation ongoing, we have 10+ copies of 30 last days over multiple servers on 2 DCs] * Incremental backups for append-only external storage servers [Not implemented yet, lower priority, external store servers are backed up using regular logical backups right now] == Physical servers and services part of, or related to MariaDB backups == [[File:Database backups overview.svg|thumb|Architecture of Wikimedia Foundation infrastructure Database backups (high level overview of services and data and control flows)]] * The many '''production database servers''' that are used by Mediawiki and many other analytics, cloud-support and other misc services. These are normally called <code>db[12]*</code> (database) servers, in addition to the larger <code>es[12]*</code> (external storage) servers ** There are other mysql-focused servers, like <code>pc*</code> and <code>clouddb*</code> (formerly, <code>labsdb*</code>) hosts, but those are not backed up- in the first case, because they are essentially disk cache (data can be lost without losing user data), in the second because they don't contain but a filtered replica of production data ** es servers follow the same logical backup as metadata servers, although in the future we may change into more optimized incremental backups ** For a summary of the functionality of the different data servers, see: [[MariaDB#Sections_and_shards]] * There are specific '''dedicated backup replicas''', whose only function is to mirror the production datasets and provide a source of backups with enough performance and reliability, without affecting the production hosts due to the load and locking on generating backups * '''Bacula service''', storing backups (only the logical and incremental) long-term, reading from the provisioning servers and sending it to the bacula storage nodes * '''Provisioning servers''': storing backups short term for quick recovery, provisioning and the post-processing needs (e.g. rotation, xtrabackup --prepare, compression, consolidation if many files are created). They, themselves, orchestrate the logical (dumps) backups. * '''Testing databases''': db1133 and db2102 (one host per datacenter is procured). They are non-production hosts that regularly recover logical backups and snapshots and set up replication to verify backups are working properly [not yet implemented] * '''Cluster management servers''' (aka cumin hosts: {{CuminHosts}}): provide orchestration, specifically for snapshots, as they require remote execution -beyond the mysql protocol- to execute root commands on the source servers and sending files to the provisioning servers * '''dbbackups''', database (currently at m1 section) stores metadata information about the generated backups at generation time, and informs about the current ongoing status as well as the ''a priori'' success or failure of backups. It also stores the size and name of each file generated, for further data and backup analysis and trending. An icinga check is also setup there which alerts if the latest fresh backup is older than the configured amount of time. == Software and deployment architecture == [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/wmfbackups/+/refs/heads/master/wmfbackups/WMFBackup.py WMFBackup] is the main class controlling the generation of backups. It is an extensible backup class that at the time can use 3 class-methods: * NullBackup: Does nothing * MariaBackup: Uses <code>mariabackup</code> utility, fork of XtraBackup just recompiled to support MariaDB specific InnoDB format. MariaBackup/XtraBackup allow for (in theory) lower time to recovery, as it will be as fast as putting the files into the dir and starting mysql. It is, at the time, the chosen method to generate what we call generically "snapshots" (binary or raw backups). * MyDumper: Uses <code>mydumper</code> to generate a fast, highly parallel, compressed logical dump. It is, at the time, the chosen method to generate what we call "dumps" (logical backups). The backup process has 4 main functions: * Generate the backup files (e.g run <code>mydumper</code> or <code>mariabackup</code>) * Post-process it, with a number of compulsory and optional tasks: check the backup seems complete, prepare it (in the case of mariabackup), consolidate (tar) per database, compress, and rotate to its final location. * Generate state and metadata about the backup and the file it generates (by the <code>BackupStatistics</code> class) Normally the class functionality is controlled by the command line utility <code>[https://phabricator.wikimedia.org/diffusion/OSWB/browse/master/wmfbackups/cli/backup_mariadb.py backup-mariadb]</code>, documented at [[Backup-mariadb]]. <code>backup-mariadb</code> utility and its libraries are deployed on the provisioning host. However, snapshotting requires remote root execution therefore we can only do local mysql snapshots with it. To generate remote snapshots, the [[Transfer.py]] script is being used for the first part of the backup, installed on the cluster management servers ({{CuminHosts}}). Every day, a cron job runs <code>remote-backup-mariadb</code> script, and send the snapshots to the provisioning hosts using <code>transfer.py</code>. Then, it runs locally on the provisioning host <code>backup-mariadb</code> in order to post-process the generated files and gather the metadata statistics. <code>transfer.py</code> is a generic utility, installed on the cluster management (orchestration) hosts ({{CuminHosts}}) to transfer files over the network, but has a switch <code>--type=xtrabackup</code>, that allows also to transmit in a consistent way the mysql files of a live mysql server. Please note that the post-process of <code>backup-mariadb</code> is still needed after <code>transfer.py</code>, as <code>xtrabackup</code> (or <code>mariabackup</code>) <code>--prepare</code> has to be run before recovering the files to a server. The backups are stored in the following location: /srv /backups /dumps /ongoing # ongoing logical backups /latest # latest completed logical backup /archive # recent, but not latest, logical backups (in case the latest has issues), regularly purged /snaphosts /ongoing # ongoing mariabackup files /latest # latest completed and prepared mariabackup tarballs /archive # recent, but not latest, mariabackup tarballs (in case the latest has issues), regularly purged Finally, bacula regularly backups the configured path on the chosen active datacenter for long term backups (right now, eqiad) copying only the latest logical backups (<code>/srv/backups/dumps/latest</code>). Puppet code is distributed into the following profiles: * <code>mariadb::backup::bacula</code> [provisioning host] to move logical dumps to long term storage (bacula) * <code>mariadb::backup::check</code> [currently on the alerting hosts, cbut could really be run from almost anywhere] icinga checks that backups are generated correctly and fresh * <code>mariadb::backup::mydumper</code> [provisioning host] automation of logical backups, and backup and recovery software in general * <code>mariadb::backup::snapshot</code> [provisioning host] snapshotting environment * <code>mariadb::backup::transfer</code> [orchestration hosts] automation of snapshots Incrementals and binlogs are not productionized/fully automated yet (although one can find binlogs and manual es backups in several locations). A summary of the flow of control and data can be seen at the diagram: [[:File:Database_backups_overview.svg|Database_backups_overview.svg]] == Logical backups == Mydumper is used for creating logical backups. Not only it is faster to create and recover than mysqldump, it also allows table or even lower level granularity- very interesting for when a single table needs to be restored. On the host with the profile <code>mariadb::backup::mydumper</code> (provisioning hosts), a configuration file exists on <code>/etc/mysql/backups.yaml</code> that configures what and where and how to backup. Weekly, the script <code>backup-mariadb</code> reads that config file and backups the core and misc sections into <code>/srv/backups/dumps</code>. The backup host -at the moment dbstore1001 and es2001- only keeps the last or latest backups so they can be sent to bacula, not long term. The backups are generated asynchronously to the bacula director handling because with the old system, it used to block all other backup and recovery processes for a long time. Now bacula only has to retrieve the latest directory and get the latest successful backups. Each backup is a single directory with the following name: dump.section_name.timestamp timestamp is not in ISO format as ':' is a special symbol by some languages, like the command prompt. It is in the following format instead: YYYY-MM-DD--HH-mm-ss Each directory contains the same structure than a regular mydumper process generates, except that, if so configured, it can tar all objects of a single database to avoid having hundreds of thousands of files. This currently happens for x1 and s3, which have many thousand of objects- that gets reduced to only ~900, one per database. To understand the status of the backups, mydumper creates a <code>metadata</code> file that contains the start of the backup process (which the dump should be consistent to, the exact binlog and GTID coordinates for that consistency point, and the time the backup finished. If the backup fails, normally the database dir will not copied to latest and a log file on the ongoing directory with the name of the section will show some kind of error. If for some reason mydumper was successful but the overall process cannot, one can retry the backup with the <code>--only-postprocess</code> option to rotate, consolidate, compress and/or generate statistics. === Orchestration and scheduling === At the moment, logical dumps of all Mediawiki metadata sections, as well as content, [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mariadb/backup/mydumper.pp$75 happens every week on Tuesday UTC mornings] (check puppet for the latest schedule). This is initiated with a cron job on each db provisioning host, that reads the /etc/wmfbackups/backups.cnf on the local dbprov host and controls which sections to backup, with which options. === Adding a new dump === # Add <code>EVENT, LOCK TABLES, SELECT, SHOW VIEW, TRIGGER</code> grants to the given backup user for the database backup # Add also <code>FILE, RELOAD, REPLICATION CLIENT, SUPER</code> to the same user on *.* # Make sure, if exists, that the regex filtering backups or not the appropriate objects # Add or otherwise make sure that the host is backed in eqiad and/or codfw on <code>puppet://modules/profile/templates/mariadb/backups-*.cnf.erb</code>. The right location will try to balance the load and disk space used of the several hosts used for backup # Make sure your backups are done as scheduled, by looking at dbbackups db on m1, tables backups and backup_files. === Recovering a logical backup === To recover, there is a script called [[recover-dump]] which automates the decompression (if a .tar.gz), untar (if databases consolidated) and runs myloader. If for some reason that wouldn't work, myloader can be called directly (assuming the directory is not compressed, tarred or that has been handled beforehand), or even load individual objects directly (mydumper creates a .sql with the structure and a .sql data file per table). This is a step by step guide of how to recover a backup. For this example we will recover: '''x1''' on '''db1120''' from the backup host '''es2001''' ==== Pre-requisites before recovering the backup ==== * db1120 must have a MySQL server up and running (if it is a new host follow [[MariaDB#Setting_up_a_fresh_server]] to set it up). * db1120 must have privileges to let the source host (e.g. dbprov1001) connect, create the databases and tables (and other objects) and import the rows ** This can be tested with ''dbprov1001$ mysql -hdb1120.eqiad.wmnet -uroot -pREDACTED'' {{Notice| Tip: Some of us like to copy the source backup dir about to be recovered to the SSDs (/srv/backups/dumps/ongoing) for several reasons: * It will avoid errors if rotation happens- unlike snapshots, that are a single file, moving all the files on the dir away for rotation when a new backup happens will make the recovery process fail * If it is an "archived" backup (x1, s3) the untarring process that happens before the actual recovery may be faster * If you want to recover only one db or table, it is safer to delete the parts that won't be recovered in a copy, rather than using (or accidentally modifying!) the original set of files }} ==== Recovering the data ==== It is recommended to run the following command in a '''screen''' session: dbprov1001:~# recover-dump --host db1120.eqiad.wmnet --user root --password REDACTED --port 3306 x1 Attempting to recover "dump.x1.2018-07-24--23-30-38" Running myloader... Alternatively, if a specific backup has to be recovered, different from the one on latest with the given section, it also accepts absolute paths: dbprov1001:~# recover-dump --host db1120.eqiad.wmnet --user root --password REDACTED --port 3306 /srv/backups/dumps/archive/dump.x1.2018-07-24--23-30-38 Remember to provide an '''absolute path to the dir or compressed .tar.gz''' if using the second format. it doesn't have to be inside /srv/backups/, it can be anywhere where your current user can read. This will start recovering all the data on db1120 for the whole x1 section. We can check if this is actually working by checking if the data directory on db1120 is growing db1120:~# du -s /srv/sqldata/ This can take several hours and once finished, es2001 will return to the prompt. And the data is ready on db1120 to be analyzed and processed if needed. ==== Recovering the data (Mariadb 10.6) ==== This method should work for both 10.4 or 10.6 (or other versions), but it is the only one that works reliably in 10.6. It is actually faster and fixes some bug in myloader, but it is not properly implemented yet (it is a puppet bash script with very few options). * First, create a credentials file under <code>/root/.my.cnf</code>, like this: [client] host = db1024.eqiad.wmnet port = 3306 user = root password = <same password than the user grant added to the server for recovery> This file has the same options as the mysql command line client, and it is the running mysql server where the data will be sent to. * Second, run the command: <code>mini_loader.sh recovery_directory</code> where the only parameter is the mysql dump directory to recover. For example: mini_loader.sh /srv/backups/dumps/dump.x1.2018-02-28--01-20-46 * The recovery will inform of the progress by printing the start and finish of each file processed, or errors detected, like this: Starting recovery at 2022-12-02 12:57:56+00:00 Creating database testwiki... Database testwiki created successfully ... Creating table testwiki.revision... Table testwiki.revision created successfully Importing data to table testwiki.revision... Table testwiki.revision imported successfully ... Finishing recovery at 2022-12-02 23:27:59+00:00 Remember to remove /root/.my.cnf to prevent accidental loads * The script will remind you to remove <code>/root/.my.cnf</code> after the load, to prevent accidental recoveries after the fact ==== Enabling replication on the recovered server ==== If the server needs to be pooled in production, firstly we have to enable replication so it can catch up. First of all we should create the ''heartbeat'' table on our server. CREATE DATABASE heartbeat; USE heartbeat; CREATE TABLE `heartbeat` ( `ts` varbinary(26) NOT NULL, `server_id` int(10) unsigned NOT NULL, `file` varbinary(255) DEFAULT NULL, `position` bigint(20) unsigned DEFAULT NULL, `relay_master_log_file` varbinary(255) DEFAULT NULL, `exec_master_log_pos` bigint(20) unsigned DEFAULT NULL, `shard` varbinary(10) DEFAULT NULL, `datacenter` binary(5) DEFAULT NULL, PRIMARY KEY (`server_id`) ) ENGINE=InnoDB DEFAULT CHARSET=binary; And we should now insert a couple of rows with the server_id of the replication chain masters (eqiad and codfw just to be sure). To do so we can check: https://tendril.wikimedia.org/tree and check our section and masters. For x1 our masters are: ''db1069'' and ''db2034'' Let's find out the server_id root@cumin1001:~# mysql.py -hdb2034 -e "SELECT @@hostname; SELECT @@server_id" -BN db2034 180355159 root@cumin1001:~# mysql.py -hdb1069 -e "SELECT @@hostname; SELECT @@server_id" -BN db1069 171966572 Now let's insert those two rows on '''db1120''' USE heartbeat; SET SESSION sql_log_bin=0; INSERT INTO heartbeat (server_id) VALUES (171966572); INSERT INTO heartbeat (server_id) VALUES (180355159); To do so, we have to gather the coordinates from the backup which are in a file called '''metadata'''. In our case the backup is at ''dump.x1.2018-07-24--23-30-38'' es2001:~# cat /srv/backups/latest/dump.x1.2018-07-24--23-30-38/metadata Started dump at: 2018-07-24 23:30:38 SHOW SLAVE STATUS: Connection name: Host: db2034.codfw.wmnet Log: db2034-bin.000196 Pos: 978121166 GTID:0-171970580-683331037,1-171970580-1,171966572-171966572-191034075,171970580-171970580-596994206,171974681-171974681-198565537,180355159-180355159-13448767,180363268-180363268-40608909 Finished dump at: 2018-07-25 00:32:34 What we need to enable replication: * Host * Log * Pos Once we have those we can execute the following command on db1120 (the password is at ''repl-password'' file on the ''pw'' repo) CHANGE MASTER to MASTER_HOST='db2034.codfw.wmnet', MASTER_USER='repl', MASTER_PASSWORD='REDACTED' ,MASTER_PORT=3306, MASTER_LOG_FILE='db2034-bin.000196', MASTER_LOG_POS=978121166, MASTER_SSL=1; START SLAVE; SHOW SLAVE STATUS\G We should see our ''Seconds_behind_master'' decreasing (sometimes it can take a while to see it decreasing) If we have recovered an eqiad host from a codfw source, once the server has caught up, we need to move it under the eqiad master. In our case, db1120 is replicating from db2034 which is a codfw master, so we need to move them to be at the same level, that is: db1120 must be a sibling of db2034 and not a child. To do so we can use ''''repl.pl'''' script on ''''neodymium'''': ./marostegui/git/software/dbtools/repl.pl --switch-child-to-sibling --parent=db2034.codfw.wmnet:3306 --child=db1120.eqiad.wmnet:3306 == Snapshoting and disaster recovery == While logical backups have a lot of advantages: * '''Small disk footprint''' (specially if compressed). High compression ratio and no space wasted on indexes or fragmented data. * '''Fast to generate''': If enough data is on memory, reads can be very fast, specially if done with enough parallelism * '''Very low granularity on recovery''': Because we use one (or several) separate files per table, we can recover single databases, single tables or even individual rows if properly filtered. We can even separately recovered structure and data. * '''Not prone to corruption''': A physical copy of file would maintain and make very difficult to detect certain kinds of corruption. Because a logical dump requires reading all rows, once exported, corruption cannot happen unless the exports are themselves corrupted at a later time. * '''Software independency and portability''': Because in the end we are just generating text files, the format has great portability. It can be used on different MySQL/MariaDB versions, vendors or even on different database software. Also, they can converted if needed as they are human-readable text/manageable with 3rd party software. Because many of the above, this format is ideal for long-term preservation. However, logical dumps have some important weaknesses: * '''They are slow to recover''', as they have to be reimported row by row, as well as indexes have to be regenerated again. * Taking a dump can create a lot of '''performance impact''' both due to the amount of logical reads need, and the state in which they leave the buffer pool afterwards Because of this, logical backups tend to increase a lot the ''Time to Recovery'' in case of a full disaster. That is where '''snapshots (how we call generically raw or binary backups)''' come into play: While larger in size, they tend to be faster to recover as they use the native database format- only requiring to shutdown the original server and copy back the files- making it as fast as a regular file copy can be sent over the network. Snapshots are the bases of our "fast" disaster recovery method. Snapshots can be generated in several ways: * Cold backup (shutdown the server and copy the files) * lvm snapshots + recovery * MySQL Enterprise backup/Xtrabackup We chose, after some research ({{phabricator|T206204}}), to use the latest option, specifically with MariaDB's fork of the free software Percona XtraBackup, '''MariaBackup''', which is mostly identical in usage and functionality to XtraBackup, but is compiled with MariaDB and thus supports its internal format better (Xtrabackup started having issues with MariaDB since 10.0). <code>MariaBackup</code> is one of the supported methods in the WMFBackup Class, which means it can be taken using <code>backup-mariadb</code> command line utility. The main difference between snapshot taking and logical backups is that snapshots requires raw access to the underlying mariadb files as a privileged user in addition to access to mysql itself as a privileged account. Thus, <code>backup-mariadb</code> only is able to create localhost backups by itself (more on remote backups later). When used, files are copied to <code>/srv/backups/snapshots/ongoing</code>, in what would appear as a complete datadir copy. However, at this point the backup process is not complete- as documented on the [https://www.percona.com/doc/percona-xtrabackup/ XtraBackup documentation page], the backup '''needs to be prepared before being used'''. The preparation step of the backup takes care of that before continuing with the rest of post-processing steps. At the moment preparation happens always after backup, but there are reasons to postpone this (e.g. to generate incremental or differential backups, or to export individual tables). This would be possible in the future, but not currently supported. Special mention requires that one normally would want to '''compress''' the final set of files for several reasons: The most obvious is space saving- snapshots are as large as the original datadir. The second reason would be that while normally mydumper can be used to recover individual parts of the database, normally snapshots are used for full recoveries and provisioning only, thus them being pre-compressed speeds up the later full recovery. === On Remote Snapshotting === With the above implementation, local snapshotting would be possible; however a backups is only a backup if it is fully offline. Several options and designs were considered for this- in particular, the possibility of preparing backups locally to the host before sending them away on some way. This was discarded for the following reasons: 1) read access was needed to the datadir, making it difficult to do privilege separation; 2) it is not out of the question that a sever may not have enough space to temporarily store a copy of its database (specially under emergency, which is what snapshotting was intended for) and 3) if preparation was compulsory from the start, it would make incremental/differential backups impossible in the future. Based on that, database hosts would only run <code>xtrabackup (mariabackup) --backup</code> and then use its streaming capabilities to send its data away, to be prepared on a second step. For that purpose, existing tool <code>transfer.py</code>, used in the past to perform cold backup transfer as well as general file transfer between hosts over the network was modified to allow xtrabackup as the source of data transference. Once transferred to the provisioning host(s), to <code>/srv/backups/snapshots/ongoing</code>, the utility <code>backup-mariadb</code> is run, although with the <code>--only-postprocess</code> option so it finds and treats its, now, local files, but does not attempt to generate a new snapshot. So, in summary, while logical backups and snapshots share most of the workflow, because of the particularities of creating a remote snapshot, this requires an extra initial step where first files are transferred and taken out of the source hosts (unlike logical backups, which are able to perform that just by using the mysql query protocol). === Orchestration and Scheduling === Because the remote copy mentioned on the previous section, remote execution is needed, something that is not available at the time (and probably never, for security reasons) from the provisioning hosts. This is why the transfer itself has to be initiated from the cluster management hosts ({{CuminHosts}}). Probably not surprisingly, cumin is being used for remote execution. At the moment, snapshotting of all Mediawiki metadata sections happens every 1 or 2 days (4 days a week): new backups are finish at the moment on [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mariadb/backup/transfer.pp$34 Mondays, Wednesdays, Thursdays and Saturdays] (check puppet for the latest schedule). This is initiated by a systemd timer, that reads the <code>/etc/wmfbackups/remote_backups.cnf</code> yaml configuration files on the cluster management server and controls which sections to backup, with which options, as well as to which server those backups are sent. The systemd timer, rather than directly calling transfer.py and backup-mariadb, uses <code>remote-backup-mariadb</code> simple script, which does the above steps for simplicity. Transfer.py has locking logic to know if to open a new port for each transfer, starting with the number 4400, and up if more than one transfer is happening simultanously. Remote-backup-mariadb itself will not return error code (!=0) because one or more backup fails to prevent alerts spam. As long as the basic execution and parsing of config and arguments happens correctly, and there are no uncaught exceptions, it will allways return 0. See the monitoring section for how to track backups are happening correctly. Manpages are available for all executables and config file with more details about available options and their meaning. === Adding a new database to Snapshot === Snapshotting doesn't require any special privileges other than the regular ones for a production mysql, as it is done (and must be done) as root. To add additional hosts to snapshot, just edit the <code>puppet://modules/profile/templates/mariadb/backups-cuminXXXX.cnf.erb</code> templates so they are run on the provided hosts, in the given order. === Recovering a Snapshot === At the moment, there is no specific utility to fully automate snapshot recoveries. There is, however, a transfer utility (<code>[[transfer.py]] --type=decompress</code>) that can automate the initial transfer and decompression from an existing snapshot: For now, the rest of the setup steps have to be done manually: * ''[from a cumin host]'': transfer.py --type=decompress dbprov1001.eqiad.wmnet:/srv/backups/snapshots/snapshot.s1.2020-09-11--23-45-01.tar.gz db1051.eqiad.wmnet:/srv/sqldata Remember the transfer will fail if existing data is left on that dir- it should be deleted or moved away first * Chown the datadir recursively to be owned by the mysql user * Just to be safe: systemctl set-environment MYSQLD_OPTS="--skip-slave-start" * Start mysql * Setup replication based on the GTID coordinates (remember GTID tracks already executed transactions, while binlogs tracks offsets or "gaps between transactions", do not confuse both methods). GTID is normally stored on the <code>xtrabackup_slave_pos</code> file. CHANGE MASTER TO MASTER_GTID=XXXXXX; START SLAVE; == Binary log backups and point in time recovery == NOT YET IMPLEMENTED == Incremental backups and external storage disaster recovery == NOT YET IMPLEMENTED == Monitoring and metadata gathering == === Metadata === On backup generation, an entry is added to the `backups` (and several on the `backup_files`) metadata tables (at the moment, on m1/dbbackups db). It contains the up to date status (ongoing, finished, failed) state of the backup genration and, when finished, some information about it and its generated files (date, size, etc.) The dbbackups database now has 3 tables: * <code>backups</code> * <code>backup_files</code> * <code>backup_objects</code> (not in use) <code>backups</code> contain an id and properties about backups (status -ongoing, finished (correctly), failed and deleted- dir name, source, section, start_time, end_time, total_size, etc: CREATE TABLE `backups` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(100) CHARACTER SET latin1 DEFAULT NULL, `status` enum('ongoing','finished','failed', 'deleted') COLLATE utf8mb4_unicode_ci NOT NULL, `source` varchar(100) CHARACTER SET latin1 DEFAULT NULL, `host` varchar(300) CHARACTER SET latin1 DEFAULT NULL, `type` enum('dump','snapshot', 'cold') COLLATE utf8mb4_unicode_ci DEFAULT NULL, `section` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `start_date` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01', `end_date` timestamp NULL DEFAULT NULL, `total_size` bigint(20) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `last_backup` (`type`,`section`,`status`,`start_date`) ) ENGINE=InnoDB AUTO_INCREMENT=174 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci <code>backup_files</code> contain a backup id and a list of files for that backup and its properties (date, size, name) CREATE TABLE `backup_files` ( `backup_id` int(10) unsigned NOT NULL, `file_path` varchar(300) CHARACTER SET latin1 NOT NULL DEFAULT '', `file_name` varchar(300) CHARACTER SET latin1 NOT NULL, `size` bigint(20) unsigned DEFAULT NULL, `file_date` timestamp NULL DEFAULT NULL, `backup_object_id` bigint(20) unsigned DEFAULT NULL, PRIMARY KEY (`backup_id`,`file_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci <code>backup_objects</code> will link the backup files to specific objects (tables, databases, triggers, etc.) for further checking (at the moment, this is not filled in). This will be useful when we maintain an inventory of database objects for all servers so we can make sure no objects are left uncopied/have appropriate size, etc. CREATE TABLE `backup_objects` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `backup_id` int(10) unsigned NOT NULL, `db` varchar(100) CHARACTER SET latin1 NOT NULL, `name` varchar(100) CHARACTER SET latin1 DEFAULT NULL, `size` bigint(20) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `backup_id_db_name` (`backup_id`,`db`,`name`) ) ENGINE=InnoDB AUTO_INCREMENT=6352 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci Metadata is collected during generation, by the same script that generates the dumps, <code>backup-mariadb</code>. It connects to the m1 database (db1080:dbbackups at the moment) and logs the information. If the logging fails, the backup should continue, but '''it cannot log a successful backup without the backups being successful, too'''. The metadata database ("statistics" db) is controlled on the backup configuration, on the <code>/etc/wmfbackups/[remote_]backups.cnf</code>, the same place that controls the backups to be done. For separation, the actual db configuration is setup on a separate file, as configured by the statistics-file option, as it most likely contain user and password details. At the WMF infrastructure, this lives one the <code>/etc/wmfbackups/statistics.cnf</code> file. === Alerting === [[File:Screenshot_Icinga_backup_monitoring.png|thumb|right|Icinga checks database backups are generated correctly every week and there is a recent, successful, non-0 sized, no older than 8 days backup package]] Icinga checks are run through nrpe on backupmon1001, and are controlled by a hiera setting: [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/hieradata/role/common/dbbackups/monitoring.yaml <code>/hieradata/role/common/dbbackups/monitoring.yaml</code>] This is used at the moment to alert if backups fail for any reason (icinga checks query m1 database with <code>check-mariadb-backups</code>) and it checks backups are generated correctly weekly and have a reasonable size. Further checks could be added later, but thorough backup testing should be done through proper recovery validation (full recovery to test hardware, starting replication and maybe some checksums/smoke tests). The test will fail if: * The latest backups is older (the <code>end_date</code> of the full backup process, not the moment they are consistent with) than the configured date. At the moment, that is 8 days (7 + 1 day of buffer) for dumps, and 3 (2 + 1) for snapshots. Check [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/hieradata/role/common/dbbackups/monitoring.yaml Puppet] for the latest values used on production * If latest backup is not in a <code>finished</code> state (that means it finished without errors, including metadata gathering), it is ignored. The check, as part of the backup, includes a check of one of the files last being generated in the backup being present, aside from the exit code and no errors on logs (warnings are accepted). * The backup does not run at all, or it runs but it cannot insert its metadata on the backup tables * The backups is smaller than a certain size (1MB min_size), so it is not a 0 or a few bytes in size. * The backup is about the same size than the previous one finished (±15% for a crit, ±3% for a warning) - so something weird, like an explosion in growth, or large missing/deleted datasets are detected. * There is at least 2 last correct backups (it warns if there is only 1, as it cannot check the size is correct, for human checking the first backup generated correctly) Backup checks are configured separately than backup runs- this in on purpose, so backups are not deleted by accident (or a bug) and not noticed. This can lead to inconveniences, to be revisited later. '''Metadata gathering into the database is allowed to fail''' (it is a non-fatal error), letting the backup process continue. This means a backup can be successful, but as no metadata has been gathered, it will show as a failure (false positive). A false negative shouldn't occure. If the backup finished correctly, one can manually run its metadata by rerunning the backup with <code>backup-mariadb --only-postprocess</code> option, which will analyze the files and write them to the database, once the problem is corrected (grants, database down, network down, bug, etc.). More details can be seen in the source code of the check: * <code>[https://phabricator.wikimedia.org/diffusion/OSWB/browse/master/wmfbackups/check/check_mariadb_backups.py check_mariadb_backups.py]</code> === Dashboard === [[File:Dbbackups_dashboard_status.png|thumb|right|Home page]] [[File: Dbbackups dashboard detail.png|thumb|right|Backup job details]] To more easily check the status of backups and check files, jobs running, failed backups, etc. there is a work in progress dashboard for easier check. To access it (currently it is not publicly deployed) one has to run: ssh -L 8000:localhost:8000 backupmon1001.eqiad.wmnet So you forward your local port 8000 to backupmon1001.eqiad.wmnet's 8000, where currently it runs from an unprivileged user in debug mode. Then open your browser at: http://localhost:8000 The code for the dashboard rests [https://gitlab.wikimedia.org/repos/sre/pampinus on GitLab]. == Backup testing == NOT YET IMPLEMENTED == Backups quick cheatsheet == ''This is WIP'' The assignment of each backup and its preprocessing server can be located on Puppet at [[phab:source/operations-puppet/browse/production/modules/profile/templates/dbbackups/|/modules/profile/templates/dbbackups]]. * dbprov* contain metadata and misc db backups * backupXXX2 contain content db backups Dumps are stored long term on Bacula. === Provision a precompressed and prepared snapshot (preferred) === * Once the snapshot has been generated go to one of the cluster management hosts ({{CuminHosts}}) transfer.py --type=decompress dbprov2002.codfw.wmnet:/srv/backups/snapshots/latest/xxxx.tar.gz DESTINATION.FQDN:/srv/sqldata * Once the data has been copied over successfully, ssh to the host. systemctl start mariadb cat /srv/sqldata/xtrabackup_slave_info | grep GLOBAL | mysql * Now from the mysql prompt of that host (or from cumin) configure the replication thread: CHANGE MASTER TO MASTER_HOST='FQDN', MASTER_USER='<user>', MASTER_PASSWORD='<pass>', MASTER_SSL=1, master_use_gtid = slave_pos; start slave; === Copy data from a backup source to a host === You [https://www.rfc-editor.org/rfc/rfc2119 should not] use this method unless there is a good reason to (it is much slower, computational intensive and error prone): * Stop replication * Gather the coordinates * From one of the cluster management hosts ({{CuminHosts}}): Run transfer.py with type xtrabackup and using: <code>SOURCE_HOST:SOCKET DESTINATION_HOST:DATA_DIR</code> (if the transfer will not happen within the same datacenter please remove <code>--no-encrypt</code>): transfer.py --type=xtrabackup db1140.eqiad.wmnet:/run/mysqld/mysqld.x1.sock db1127.eqiad.wmnet:/srv/sqldata * Once the transfer is done, you must prepare the backup before starting mysql. Ssh to the host: xtrabackup --prepare --use-memory=300GB --target-dir=/srv/sqldata Prepare must run on a mariabackup version newer or equal to the source server version, and preferably of the same major version. === Rerun a failed backup === {{notice|To simply rerun a failed snapshot, execute on a screen on the local cumin host: <code>sudo remote-backup-mariadb [section]</code>, where section is the failed job (s1, zarcillo, etc.)}} It is not uncommon for a backup run to fail. This is because even if the backup process itself was 100% reliable, there can be external factors that make impossible for the backup to finish correctly (especially during the resource intensive process which is a large copy), such as: * Network goes down during the backup process, making it impossible to continue * A backup source db server suddenly has a lot of load; or maintenance (like a schema change) happens * The copy process is so intensive that network gets saturated and errors so much that cannot continue * A server process or its hardware fails in the middle of the copy Snapshots (xtrabackup) are more likely to fail due to temporary unavailability than mydumper (as it involves 3 servers: the backup scheduler, the database and the storage; plus it requires 10x the amount of bandwidth). For this reason, after the batch of xtrabackup backups is finished, '''a new attempt to copy is done once again'''. After it has failed twice, no new attempts happen automatically (to prevent infinite loops and taking over all resources). As of March 2022, '''failed backups are not cleaned up automatically''' for 2 reasons: allowing later debugging and saving the data, in case it could be reused later with human intervention. dbprov hosts should have enough space to accommodate this. '''Failed backups do not immediately alert''', they only log it to the server logs and to the database backups metadata db. Only if backups fail constantly will an alert be generated, as that backup will be detected as stale. ==== Rerunning a failed snapshot ==== * '''Research''' why the snapshot failed in the first place. Just rerunning it without looking at the logs may waste your time (e.g. if the issue is a lack of space). Confirm there is a reason that has been corrected (a software bug, service down, network issue, etc.). The most common issue with snapshots is network errors, as snapshots tend to use a lot of bandwidth. To debug issues one can look at: ** The icinga alert called "<code>snapshot of [section] in eqiad/codfw</code>": alerting of violation of freshness expectations ** The database of the database backups metadata (<code>dbbackups</code>) in ''m1'' (<code>db1128</code> for all datacenters), useful to know where the backup was running: <code>dbbackups> SELECT * FROM backups WHERE section='[section]' and type='snapshot' ORDER BY id DESC LIMIT 10;</code> ** The log of the software automation in the server executing the backup (<code>cuminXXXX</code>): <code>journalctl -u database-backups-snapshots</code>) or if the host has been recently restarted: <code>grep database-backups-snapshots daemon.log*</code> ** The partial database itself (if it is empty, what had been backed up so far, etc.) on dbprovXXXX:/srv/backups/snapshots/ongoing * If the transfer of files was ok, and the only thing that failed was the <code>xtrabackup --prepare</code> or any of the preparation steps (e.g. compression, write permission error), one can rerun exclusively those by executing: sudo chown -R dump: /srv/backups/snapshots/ongoing/snapshot.s9.2022-04-01--09-31-45 sudo -u dump backup-mariadb /srv/backups/snapshots/ongoing/snapshot.s9.2022-04-01--09-31-45 --type=snapshot --rotate --compress --retention=4 --only-postprocess --stats-file=/etc/wmfbackups/statistics.ini * In all cases, one can always '''generate a new backup from zero'''. The easiest way is to run from the cumin host on the local dc: sudo remote-backup-mariadb s9 :Where s9 is the section (or list of sections, separated by spaces to retry). To retry all sections (do a full manual run): <code>sudo remote-backup-mariadb all</code> * '''Cleanup''': if there are leftover files from previous failed or partially completed backups, and those will no longer be useful for debugging, they can (have to) be removed manually. Make sure when deleting files you don't accidentally delete "good" ones e.g. cd to the ongoing file so you don't affect latest/archive ones. ==== Rerunning a failed dump ==== * '''Research''' why the dump failed in the first place. Just rerunning it without looking at the logs may waste your time (e.g. if the reason is a lack of space). Confirm there is a reason that has been corrected (a software bug, service down, network issue, etc.). To debug issues one can look at: ** The icinga alert called "<code>dump of [section] in eqiad/codfw</code>": alerting of violation of freshness expectations ** The database of the database backups metadata (<code>dbbackups</code>) in ''m1'' (<code>db1128</code> for all datacenters), useful to know where the backup was running: <code>dbbackups> SELECT * FROM backups WHERE section='[section]' and type='dump' ORDER BY id DESC LIMIT 10;</code> ** The log of the software automation in the server executing the backup (<code>dbprovXXXX:/var/log/mariadb-backups/backups.log</code> ** The mydumper log, found on the <code>/srv/backups/dumps/ongoing/dump_[section].log</code> file ** The partial database itself (if it is empty, what had been backed up so far, etc.) on dbprovXXXX:/srv/backups/dumps/ongoing * '''If the mydumper output itself was ok''' (e.g. permission error, or something else in the postprocessing job), one can continue the backup process with the <code>--only-postprocess</code> on an existing backup, which will only archive, rotate and analyze the files without generating a new backup, like this in a screen session: sudo -u dump backup-mariadb /srv/backups/dumps/ongoing/dump.es5.2022-05-03--14-34-46 --only-postprocess --type=dump --rotate --retention=8 --stats-file=/etc/wmfbackups/statistics.ini :This also works if the backup was completed, but we want to regenerate its statistics again * In all cases, one can always '''generate a new backup from zero'''. The easiest way is to edit the configuration file (<code>/etc/wmfbackups/backups.cnf</code>) and only leave the backups one wants to retry, then execute in a screen session: sudo -u dump backup-mariadb :A new dump can take 2-3 hours for a metadata database, and up to 24 hours for a full ES server. * '''Cleanup''': if there is leftover files from previous failed or partially completed backups, and those will no longer be useful for debugging, they can (have to) be removed manually. Make sure when deleting files you don't accidentally delete "good" ones- e.g. cd to the ongoing file so you don't affect latest/archive ones. == See also == * [https://phabricator.wikimedia.org/diffusion/OSWB/ WMFBackups code repository] [https://gitlab.wikimedia.org/repos/sre/wmfbackups on gitlab] * [https://phabricator.wikimedia.org/diffusion/OSPM/ Pampinus repository][https://gitlab.wikimedia.org/repos/sre/pampinus on gitlab] * [[Bacula]] * [[transfer.py]] * [[Backup-mariadb]] * [[Recover dump.py]] * [[MariaDB/Backups/Example recovery|Example recovery]] [[Category:MySQL]] [[Category:Backups]] tqwolq2e7ndaajexsotfdiqllju5qja Nova Resource:Tools.stewardbots/SAL 498 73236 2243251 2243186 2024-11-11T11:06:23Z Stashbot 7414 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss 2243251 wikitext text/x-wiki === 2024-11-11 === * 11:06 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-11-10 === * 07:49 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 07:47 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-11-01 === * 19:17 wmbot~melos@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-31 === * 22:34 wmbot~melos@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-30 === * 13:55 wmbot~melos@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-17 === * 13:27 wmbot~melos@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-15 === * 08:45 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-05 === * 00:23 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-10-01 === * 07:39 wmbot~h2o@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # Disconnected === 2024-09-10 === * 07:39 wmbot~melos@tools-bastion-13: ./stewardbots/SULWatcher/manage.sh restart # Disconnected === 2024-09-09 === * 13:59 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-09-08 === * 21:07 wmbot~deltaquad@tools-bastion-12: ./stewardbots/StewardBot/manage.sh restart # Disconnected * 21:06 wmbot~deltaquad@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss * 09:49 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 06:18 wmbot~deltaquad@tools-bastion-12: ./stewardbots/StewardBot/manage.sh restart # Disconnected === 2024-08-30 === * 17:16 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-08-29 === * 14:36 wmbot~urbanecm@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # Disconnected * 07:15 wmbot~melos@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # Disconnected === 2024-08-27 === * 07:10 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-08-26 === * 22:01 wmbot~melos@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # Disconnected === 2024-08-18 === * 14:19 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss === 2024-08-02 === * 23:54 wmbot~anticomposite@tools-bastion-13: stewardbots/StewardBot/manage.sh stop && stewardbots/StewardBot/manage.sh start # [[phab:T371740|T371740]] * 22:06 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # IRC connection failed, try again * 21:31 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss === 2024-07-31 === * 10:35 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-07-29 === * 10:03 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-07-27 === * 15:53 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-07-20 === * 07:34 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-07-12 === * 23:56 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-07-11 === * 07:14 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # Never got to RC * 06:37 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart === 2024-07-10 === * 11:11 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss === 2024-07-09 === * 12:58 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-07-02 === * 16:53 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 16:52 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-29 === * 09:45 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-26 === * 12:39 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 11:35 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-15 === * 17:55 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-13 === * 17:36 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 17:35 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-12 === * 17:42 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-09 === * 13:29 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-07 === * 14:46 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 11:13 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-05 === * 18:35 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 12:59 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-04 === * 11:37 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-06-03 === * 11:07 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-02 === * 13:23 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-06-01 === * 15:46 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-05-28 === * 21:06 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-26 === * 13:00 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 12:59 wmbot~melos@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss === 2024-05-25 === * 09:31 wmbot~h2o@tools-bastion-13: Restarted StewardBot out of IRC === 2024-05-24 === * 10:59 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 10:59 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-23 === * 13:31 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-05-21 === * 10:58 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 10:26 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-20 === * 21:41 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-19 === * 13:06 wmbot~superpes@tools-bastion-13: Restarted SULWatcher out of IRC === 2024-05-17 === * 14:58 wmbot~superpes@tools-bastion-13: Restarted StewardBot out of IRC * 14:00 wmbot~superpes@tools-bastion-13: Restarted SULWatcher out of IRC * 09:49 wmbot~superpes@tools-bastion-13: Restarted StewardBot out of IRC === 2024-05-16 === * 13:40 wmbot~superpes@tools-bastion-13: Restarted SULWatcher out of IRC * 09:00 wmbot~superpes@tools-bastion-13: Restarted StewardBot out of IRC === 2024-05-15 === * 05:54 wmbot~h2o@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-05-13 === * 20:00 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-04 === * 13:38 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-03 === * 13:30 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-05-02 === * 11:54 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss === 2024-05-01 === * 18:48 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-30 === * 19:25 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 01:40 wmbot~anticomposite@tools-bastion-13: deploy {{Gerrit|01aace0}} === 2024-04-29 === * 14:13 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 08:50 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-28 === * 09:45 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 09:20 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 08:40 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-27 === * 13:13 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/StewardBot because of a connection loss * 10:06 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-26 === * 12:45 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-25 === * 13:12 wmbot~anticomposite@tools-bastion-13: deploy {{Gerrit|ed1afde}} * 13:04 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatcher3 disconnected * 12:54 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # SULWatcher3 not coming back up * 11:20 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-24 === * 19:26 wmbot~anticomposite@tools-bastion-13: rm -r venv/ venv-k8s-py37/ venv-py3/ venv-py37-k8s/ # Remove old unused python 2.7, 3.5, and 3.7 venvs ([[phab:T130031|T130031]]) * 19:26 wmbot~anticomposite@tools-bastion-13: rm -r venv/ venv-k8s-py37/ venv-py3/ venv-py37-k8s/ * 19:07 wmbot~anticomposite@tools-bastion-13: toolforge jobs delete sbar # This job never worked as configured and won't work as intended. * 18:37 wmbot~anticomposite@tools-bastion-13: Deploy 2f528f2..c8354da ([[phab:T329327|T329327]]) * 13:39 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 03:29 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-23 === * 20:30 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 20:30 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 17:38 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 10:35 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-22 === * 14:54 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:53 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-21 === * 14:06 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 14:05 wmbot~melos@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss * 12:59 wmbot~bsadowski1@tools-bastion-13: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-20 === * 22:34 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 17:34 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-19 === * 22:52 wmbot~melos@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 13:57 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 12:54 wmbot~anticomposite@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-18 === * 14:08 wmbot~melos@tools-bastion-13: SULWatcher/manage.sh restart # SULWatchers disconnected * 14:08 wmbot~melos@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-17 === * 18:16 wmbot~anticomposite@tools-bastion-13: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 12:56 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 08:31 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-16 === * 23:45 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-15 === * 21:08 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 13:37 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 12:19 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss * 06:30 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss * 03:06 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-14 === * 17:07 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 13:51 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-13 === * 13:14 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-12 === * 17:37 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 17:36 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-11 === * 14:46 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 10:35 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 07:02 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-10 === * 14:18 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:13 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:11 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 12:14 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-09 === * 14:18 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because the bot stopped reporting changes * 14:14 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/StewardBot because the bot stopped reporting changes * 08:01 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-08 === * 18:03 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-07 === * 14:04 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:03 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 02:19 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-06 === * 15:49 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-05 === * 15:06 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 14:02 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-04 === * 13:47 wmbot~deltaquad@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-04-03 === * 14:32 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:31 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-04-02 === * 12:59 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because of a connection loss === 2024-04-01 === * 18:13 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 11:48 wmbot~bsadowski1@tools-sgebastion-10: Restarted StewardBot/SULWatcher because it had a connection loss * 02:17 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 02:12 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-31 === * 01:54 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 01:54 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 01:54 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-30 === * 14:24 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:22 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-29 === * 13:39 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 13:17 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # disconnected === 2024-03-28 === * 13:21 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 13:21 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # disconnected === 2024-03-27 === * 11:42 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # disconnected === 2024-03-26 === * 14:22 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 14:20 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-25 === * 14:57 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:52 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-24 === * 14:49 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:47 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 01:57 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 01:57 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-23 === * 11:06 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-22 === * 15:29 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 15:28 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-21 === * 21:06 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:46 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:45 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-20 === * 14:53 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 13:52 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 09:30 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-19 === * 15:41 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 14:36 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 02:04 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-18 === * 08:56 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-17 === * 17:26 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 15:44 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 11:15 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-16 === * 16:15 wmbot~melos@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 16:13 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-15 === * 18:09 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 11:51 wmbot~hoo@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-14 === * 18:34 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 18:33 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-03-12 === * 15:21 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-10 === * 10:19 wmbot~h2o@tools-sgebastion-10: Restarted Stewardbot === 2024-03-09 === * 11:59 wmbot~melos@tools-sgebastion-10: Restarted Stewardbot * 01:00 wmbot~melos@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-08 === * 09:24 wmbot~melos@tools-sgebastion-10: Restarted Stewardbot (not feeding) === 2024-03-07 === * 18:44 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-03-06 === * 15:21 wmbot~melos@tools-sgebastion-10: Restarted SULWatcher disconnected === 2024-03-05 === * 14:12 wmbot~melos@tools-sgebastion-10: Restarted SULWatcher disconnected === 2024-03-04 === * 13:25 wmbot~melos@tools-sgebastion-10: Restarted SULWatcher === 2024-03-02 === * 10:19 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC === 2024-03-01 === * 12:43 wmbot~superpes@tools-sgebastion-10: Restarted Stewardbot which had quit from IRC * 12:18 wmbot~superpes@tools-sgebastion-10: Restarted Stewardbot which had quit from IRC * 10:03 wmbot~superpes@tools-sgebastion-10: Restarted Stewardbot and SULWatcher which had quit from IRC * 09:16 wmbot~superpes@tools-sgebastion-10: Restarted SULWatcher which had quit from IRC * 09:06 wmbot~melos@tools-sgebastion-10: Restarted StewardBot === 2024-02-29 === * 10:49 wmbot~superpes@tools-sgebastion-10: Restarted SULWatcher which had quit from IRC * 10:04 wmbot~taavi@tools-sgebastion-11: ./stewardbots/StewardBot/manage.sh restart === 2024-02-28 === * 09:58 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC === 2024-02-27 === * 12:13 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC === 2024-02-26 === * 14:11 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 12:23 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not working on IRC * 04:06 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-25 === * 15:19 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 13:28 wmbot~h2o@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-02-24 === * 16:49 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 16:36 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-23 === * 16:50 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC === 2024-02-22 === * 16:48 wmbot~anticomposite@tools-sgebastion-10: Deploy {{Gerrit|34c320e}} * 15:45 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 14:23 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-21 === * 17:09 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 17:09 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 02:00 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC === 2024-02-19 === * 14:53 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC * 01:44 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC * 01:41 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-17 === * 16:52 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-16 === * 13:32 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC === 2024-02-15 === * 19:11 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected * 09:51 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot and SULWatcher which had quit from IRC === 2024-02-14 === * 23:06 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-02-07 === * 19:34 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not feeding on IRC * 14:13 wmbot~superpes@tools-sgebastion-10: Restarted StewardBot not working on IRC === 2024-01-28 === * 03:40 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatchers disconnected === 2024-01-26 === * 21:35 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatcher disconnected === 2024-01-15 === * 23:00 wmbot~anticomposite@tools-sgebastion-10: SULWatcher/manage.sh restart # SULWatcher2 unresponsive * 22:54 wmbot~anticomposite@tools-sgebastion-10: ./stewardbots/StewardBot/manage.sh restart # Disconnected === 2024-01-09 === * 18:34 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-12-29 === * 20:33 wm-bot: <anticomposite> SULWatcher/manage.sh restart # Not connected. === 2023-12-12 === * 14:47 wm-bot: <anticomposite> Deploy {{Gerrit|342e464}} === 2023-12-01 === * 19:17 wm-bot: <superpes> Restarted StewardBot stucked on IRC * 15:30 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatchers not restarting on their own === 2023-11-22 === * 21:06 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatchers not restarting on their own === 2023-11-16 === * 17:26 wm-bot: <superpes> Restarted StewardBot stucked on IRC === 2023-11-14 === * 15:33 wm-bot: <superpes> Restarted SULWatcher out of IRC * 15:06 wm-bot: <taavi> ./stewardbots/StewardBot/manage.sh restart === 2023-11-13 === * 21:39 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatchers disconnected === 2023-11-11 === * 02:35 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatchers disconnected === 2023-11-06 === * 10:56 wm-bot: <superpes> Restarted StewardBot spamming old logs on IRC === 2023-11-04 === * 10:38 wm-bot: <superpes> Restarted SULWatcher stucked on IRC === 2023-11-02 === * 17:26 wm-bot: <superpes> Restarted SULWatcher on Toolforge after receiving Unknown internal error: (0, ''); traceback in console on IRC === 2023-10-31 === * 16:13 wm-bot: <superpes> Restarted StewardBot stucked on IRC * 14:17 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatchers disconnected === 2023-10-30 === * 21:29 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatcher disconnected === 2023-10-08 === * 18:41 wm-bot: <maurelio> Merged {{Gerrit|6170723}} and restarted StewardBot to apply it === 2023-10-04 === * 12:31 superpes restarted StewardBot and SULWatcher logged out of IRC === 2023-09-28 === * 14:41 wm-bot: <anticomposite> SULWatcher/manage.sh restart # Not connected. === 2023-09-27 === * 17:42 wm-bot: <maurelio> Restarted SULWatcher bots. === 2023-09-22 === * 02:47 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Timed out === 2023-09-15 === * 10:52 wm-bot: <superpes> Restarted SULWatcher stucked on IRC === 2023-09-10 === * 17:10 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-08-26 === * 01:12 wm-bot: <anticomposite> SULWatcher/manage.sh restart # Not connected. spam === 2023-08-23 === * 16:20 wm-bot: <urbanecm> Restart StewardBot, SULWatcher: RC consumer stuck === 2023-08-11 === * 13:08 wm-bot: <samtar> Restarted SULWatcher === 2023-07-26 === * 13:41 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart === 2023-07-13 === * 08:01 wm-bot: <superpes> Restarted SULWatcher out of IRC === 2023-07-10 === * 20:34 AntiComposite: Deploy {{Gerrit|2146af6}}, {{Gerrit|2aed494}} ([[phab:T341003|T341003]]) === 2023-07-03 === * 18:46 wm-bot: <anticomposite> start webservice since it appeared to not be running for some reason === 2023-06-29 === * 18:49 wm-bot: <samtar> $ SULWatcher/manage.sh restart === 2023-06-22 === * 01:25 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-06-21 === * 18:54 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-06-07 === * 17:32 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-05-31 === * 14:05 wm-bot: <anticomposite> SULWatcher/manage.sh restart # All SULWatchers disconnected === 2023-05-18 === * 22:06 Melos: SULWatcher: soft restart via IRC === 2023-05-08 === * 09:42 wm-bot: <superpes> Restarted StewardBot stucked on IRC === 2023-04-13 === * 18:06 wm-bot: <superpes> Restarted StewardBot stucked on IRC === 2023-04-05 === * 13:40 wm-bot: <maurelio> Restarting the bots === 2023-04-04 === * 09:32 wm-bot: <superpes> Restarted StewardBot stucked on IRC === 2023-04-03 === * 14:22 wm-bot: <superpes> Restarted StewardBot and SULWatchers which dropped out of IRC === 2023-03-28 === * 16:05 wm-bot: <maurelio> Restart StewardBot * 16:02 wm-bot: <maurelio> Start SULWatcher bots again * 15:58 wm-bot: <maurelio> Stop sulwatcher k8s deployment until network issues are resolved === 2023-03-23 === * 10:06 wm-bot: <superpes> Restarted StewardBot and SULWatcher stucked on IRC === 2023-03-20 === * 09:51 wm-bot: <urbanecm> Deploy {{Gerrit|d95dac4ea26}} === 2023-03-16 === * 18:15 wm-bot: <anticomposite> SULWatcher/manage.sh start # All SULWatchers disconnected * 18:14 wm-bot: <anticomposite> SULWatcher/manage.sh restart # All SULWatchers disconnected * 17:40 wm-bot: <maurelio> Restarted StewardBot; bot silent for several hours. === 2023-03-13 === * 18:22 wm-bot: <urbanecm> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-03-08 === * 15:56 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatcher 1 and 2 disconnected === 2023-03-01 === * 18:33 wm-bot: <maurelio> Update stewardbots to {{Gerrit|2dca7cd}} === 2023-02-23 === * 02:19 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatcher 2 and 3 disconnected === 2023-02-22 === * 16:52 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2023-02-19 === * 16:20 wm-bot: <maurelio> rm -v ...DATA.crontab ...DATA.olduser # cleanup: [[phab:T130031|T130031]] * 16:17 wm-bot: <maurelio> rm -v .bigbrotherrc.deprecated # [[phab:T130031|T130031]] === 2023-02-18 === * 16:09 wm-bot: <maurelio> Deleted crontab for stewardbots; backup saved at crontab_20230218.txt {{!}} [[phab:T130031|T130031]] * 16:09 wm-bot: <maurelio> Deleted crontab for stewardbots === 2023-02-13 === * Stuck pods deleted and bots restarted after T329535 got resolved. === 2023-02-09 === * 03:53 wm-bot: <anticomposite> SULWatcher/manage.sh restart # SULWatcher 2 disconnected * 01:03 herzog: Removed maintainer access to stewardbots from former stewards that are no longer active. === 2023-02-08 === * 19:27 wm-bot: <anticomposite> SULWatcher/manage.sh restart \# SULWatcher 2 disconnected === 2023-02-01 === * 21:36 AntiComposite: Restart stewardbot, rc listener died with 429 Client Error: Too Many Requests for url: https://stream.wikimedia.org/v2/stream/recentchange === 2023-01-03 === * 00:06 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2022-12-31 === * 18:35 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatchers 2 and 3 disconnected === 2022-12-08 === * 02:44 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # RCReader not RCReading === 2022-11-16 === * 22:45 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatcher3 disconnected === 2022-11-09 === * 04:23 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatchers disconnected === 2022-11-06 === * 21:25 AntiComposite: Add MarcoAurelio as maintainer * 21:20 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # RC not working === 2022-10-31 === * 15:46 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatchers disconnected * 15:31 wm-bot: <deltaquad> Restart StewardBot * 03:10 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout === 2022-10-29 === * 15:31 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatchers disconnected === 2022-10-28 === * 22:06 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # "Not connected." spam === 2022-10-27 === * 22:41 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # SULWatcher disconnected * 17:58 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # "Not connected." spam === 2022-10-21 === * 22:46 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # all bots down * 03:17 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # all bots down * 03:16 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot * 03:15 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot === 2022-10-20 === * 22:18 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot * 22:14 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # Bots joined but unresponsive * 22:10 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot * 22:09 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # all bots down * 16:06 wm-bot: <deltaquad> ./stewardbots/StewardBot/manage.sh restart # reboot of services === 2022-10-19 === * 01:57 wm-bot: <anticomposite> ./SULWatcher/manage.sh restart # all bots down * 01:56 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot === 2022-10-18 === * 21:35 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot === 2022-10-16 === * 23:58 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot * 18:05 wm-bot: <urbanecm> ./SULWatcher/manage.sh restart # all bots down === 2022-10-10 === * 23:04 wm-bot: <anticomposite> ./SULWatcher/manage.sh restert # Ping timeout, all bots disconnected * 23:03 wm-bot: <anticomposite> ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot * 20:01 wm-bot: <anticomposite> Restart all bots * 18:37 wm-bot: <anticomposite> ./stewardbots/SULWatcher/manage.sh restart # All bots disconnected * 16:39 wm-bot: <deltaquad> restart StewardBot === 2022-10-05 === * 16:56 wm-bot: <anticomposite> ./stewardbots/SULWatcher/manage.sh restart # All bots disconnected === 2022-10-01 === * 01:16 wm-bot: <anticomposite> ./stewardbots/SULWatcher/manage.sh restart === 2022-09-26 === * 20:30 wm-bot: <anticomposite> ./stewardbots/SULWatcher/manage.sh restart # Ping timeout not noticed by bot === 2022-09-01 === * 21:14 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot * 17:10 urbanecm: tools.stewardbots@tools-sgebastion-10:~$ ./stewardbots/SULWatcher/manage.sh restart * 17:08 urbanecm: tools.stewardbots@tools-sgebastion-10:~$ ./stewardbots/SULWatcher/manage.sh restart === 2022-08-29 === * 00:20 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot === 2022-08-27 === * 18:49 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot * 18:47 wm-bot: <anticomposite> SULWatcher/manage.sh restart # All bots disconnected === 2022-08-25 === * 19:49 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot === 2022-08-24 === * 00:28 wm-bot: <anticomposite> SULWatcher/manage.sh restart #all bots disconnected === 2022-08-13 === * 23:16 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot === 2022-08-01 === * 18:12 wm-bot: <anticomposite> stewardbots/StewardBot/manage.sh restart === 2022-07-25 === * 10:15 urbanecm: `tools.stewardbots@tools-sgebastion-10:~/stewardbots/SULWatcher$ bash manage.sh restart` === 2022-07-20 === * 10:43 wm-bot: <anticomposite> SULWatcher/manage.sh restart === 2022-07-19 === * 21:15 wm-bot: <anticomposite> SULWatcher/manage.sh restart * 14:42 wm-bot: <tools.stewardbots> stewardbots/StewardBot/manage.sh restart * 13:54 AntiComposite: SULWatcher/manage.sh restart === 2022-07-18 === * 23:45 AntiComposite: SULWatcher/manage.sh restart === 2022-07-11 === * 06:42 wm-bot: <TheresNoTime> Restarted SULWatcher, disconnected === 2022-07-01 === * 01:56 AntiComposite: stewardbots/StewardBot/manage.sh restart -- bot disconnected === 2022-06-05 === * 01:07 AntiComposite: deploy {{Gerrit|1a0e0eb}}, {{Gerrit|1ddaf1b}} === 2022-05-31 === * 15:02 AntiComposite: Restart StewardBot, disconnected from IRC === 2022-05-28 === * 15:51 AntiComposite: restart SULWatchers, all flooded out === 2022-05-20 === * 01:32 AntiComposite: restart SULWatcher, SULWatcher3 disconnected and RC reader 429'd === 2022-05-18 === * 17:39 AntiComposite: restart StewardBot, RC connection 429'd === 2022-05-15 === * 23:50 wm-bot: <anticomposite> Deploy {{Gerrit|d92b3c1}}, restart SULWatcher * 09:48 TheresNoTime: deployed {{Gerrit|e86e8ef}}, restarted StewardBot * 09:44 TheresNoTime: deployed {{Gerrit|5fb61ab}}, restarted SULWatcher === 2022-05-13 === * 20:50 AntiComposite: ./SULWatcher/manage.sh restart # all bots disconnected === 2022-05-12 === * 16:22 TheresNoTime: Deployed {{Gerrit|b30b346}} & restarted SULWatcher * 15:52 TheresNoTime: Deployed {{Gerrit|ef01194}} (within the last hour) === 2022-05-05 === * 01:43 AntiComposite: `./SULWatcher/manage.sh restart` db connection down === 2022-04-21 === * 03:57 AntiComposite: Deploy {{Gerrit|435a16d}} === 2022-04-20 === * 18:58 AntiComposite: ./SULWatcher/manage.sh restart === 2022-04-06 === * 19:51 AntiComposite: `./SULWatcher/manage.sh restart` all bots disconnected === 2022-03-24 === * 14:25 AntiComposite: ./SULWatcher/manage.sh restart # all three disconnected === 2022-03-23 === * 20:47 taavi: add samtar (User:TheresNoTime) as maintainer * 20:44 AntiComposite: `./stewardbots/StewardBot/manage.sh restart` timed out, did not auto rejoin === 2022-03-22 === * 14:52 AntiComposite: `./SULWatcher/manage.sh restart` Not connected === 2022-03-18 === * 15:22 AntiComposite: `./SULWatcher/manage.sh restart`, disconnected from IRC and RC === 2022-03-12 === * 03:04 wm-bot: <anticomposite> restart SULWatchers - SULWatcher was disconnected, the other two bots not speaking === 2022-03-02 === * 01:02 wm-bot: <anticomposite> ./stewardbots/SULWatcher/manage.sh restart # SULWatchers flooded out, did not reconnect automatically === 2022-02-27 === * 23:52 urbanecm: Add AntiCompositeNumber as a maintainer * 23:51 urbanecm: `tools.stewardbots@tools-sgebastion-07:~$ ./stewardbots/SULWatcher/manage.sh stop && ./stewardbots/SULWatcher/manage.sh start # SUL watchers were not present at IRC` === 2021-05-21 === * 04:49 Majavah: disable crontab auto starting jobs now running in kubernetes === 2021-03-27 === * 17:44 Majavah: deploy https://gerrit.wikimedia.org/r/c/labs/tools/stewardbots/+/670326 === 2021-03-05 === * 11:01 Majavah: restart both bots as they had disconnected from freenode === 2021-03-03 === * 12:26 Majavah: hard restart stewardbot, disconnected from the network but nothing in logs === 2021-02-26 === * 21:40 Majavah: restated stuck job stewardbot, sulwatcher seems to be doing fine === 2021-02-18 === * 08:59 Majavah: restart both stewardbot and sulwatcher manually === 2021-02-02 === * 17:18 Urbanecm: Add tks4fish and base as maintainers === 2021-01-21 === * 18:20 Urbanecm: Start StewardBot and SULWatcher again, maintenance over ([[phab:T269609|T269609]]) * 18:14 Urbanecm: Stopping StewardBot and SULWatcher for maintenance ([[phab:T269609|T269609]]) === 2021-01-19 === * 18:02 Urbanecm: Deploy {{Gerrit|b38bd19}} - SSL and SASL support, python logging, generic cleanup <noinclude>[[Category:SAL]]</noinclude> 89k3krhrl3s0ts98vauut9aqch5m9po Wikitech:Content administrators 4 400496 2243217 2242476 2024-11-11T07:12:35Z Revi C. 1919 Note about server-side 2243217 wikitext text/x-wiki {{archive|year=2024|note=This group was folded into the [[wikitech:administrators|administrators]] group after Wikitech was made a SUL wiki. (There is no logs on wiki of this transition, as it happened [[phab:T375950#10300129|server-side]].)}} On Wikitech '''content administrators''' are users with nearly the same permissions as [[Wikitech:Administrators|administrators]], however they're basically deprived of the ability of edit the wiki interface (<tt>'editinterface'</tt>) due to its sensitiveness in this Wiki. Content administrator permissions are appropriate for trusted users willing to perform general maintenance and housekeeping tasks. == Appointment == Requests for content administrator permissions are handled [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Requesting+%3Crights%3E+access+for+%3Cyour-wikitech-username%3E&owner=&description=*+Wikitech+username%3A%0D%0A*+Your+reasons+for+requesting+this+permission(s)%3A%0D%0A&projects=wikitech in Phabricator using this task template]. Content administrators can be appointed and removed by [[Wikitech:Bureaucrats|bureaucrats]]. == See also == * [[Wikitech:Administrators]]. * [[Wikitech:Bureaucrats]]. * [[Special:ListUsers/contentadmin|List of content administrators]]. [[Category:User groups]] g6n48pabh64k5xkz6kpqa6oqbo5fst7 Wikimedia Cloud Services team/Clinic duties 0 441012 2243246 2230523 2024-11-11T10:55:19Z FNegri-WMF 32595 /* Phabricator */ Link to new "need triage" filter 2243246 wikitext text/x-wiki The WMCS team practices a '''clinic duty''' rotation that runs from one weekly team meeting to the next. Each team member takes a turn sequentially performing these duties. Clinic Duty runs from one weekly meeting to the next. Your shift begins after the weekly meeting, and ends with the next. In a similar fashion, we have two '''oncall duty''' rotations, that also run for one week ([https://portal.victorops.com/api/v1/org/wikimedia/team/team-ZXiOeQOoOEqFncON/calendar/67EF88CA9AB3C469AC1AC50F6ACBC0B4.ics see the calendar]). == Clinic duty == === Start of clinic duty === * Archive the weekly meeting etherpad: ** copy the entire content of the etherpad to https://office.wikimedia.org/wiki/Wikimedia_Cloud_Services/Meeting_Notes/YYYY-MM-DD ** Create next week's etherpad at https://etherpad.wikimedia.org/p/WMCS-YYYY-MM-DD ** Copy-paste the content of the old Etherpad to the new Etherpad ** In the new Etherpad: *** delete anything outdated, such as the "Round the table" updates from the previous meeting *** modify the dates in the archive links at the top *** update the "starts"/"ends" dates in the Clinic Duty section *** remove the color highlight by clicking the "eye" icon in the Etherpad toolbar (Clear Authorship Colors) ** In the old (archived) Etherpad: *** delete all the content except the first four lines with the links to the archive * Update the topic (title) of the IRC channel {{Irc|wikimedia-cloud-admin}}: ** You need to first get op status on the channel: <code>/msg chanserv op #wikimedia-cloud-admin</code> (If that gives an error, you might not have the necessary permissions on the channel. Ask for help! :) ) ** Edit the channel topic: *** if you are using IRCCloud, click on the "cog" icon on the top right, then on "Set topic..." *** you can also use <code>/topic New topic...</code> or <code>/msg chanserv topic #wikimedia-cloud-admin New topic...</code> ** The topic should point to the etherpad for the next meeting and indicate the nick of the person on clinic duty (usually it will be your nick if you're setting this message) *** Example topic: <code>https://etherpad.wikimedia.org/p/WMCS-XXXX-XX-XX | Channel is logged at https://wm-bot.wmcloud.org/logs/%23wikimedia-cloud-admin/ | ping cteam | clinic duty: <irc nick></code> ** Remove your op status when you're done: <code>/msg chanserv deop #wikimedia-cloud-admin</code> === Meetings === * '''Monday''': attend the "SRE Staff Meeting" (if scheduled for that week) ** Report any WMCS updates that might be of interest to the wider SRE group: *** Copy the "outgoing updates" from the WMCS Etherpad to the "SRE Staff Meeting" document (linked in the calendar event) *** Feel free to add more items, or ask in IRC if you need suggestions *** If you want to speak during the SRE meeting, put one or more items in '''bold''', otherwise people will still see the items, but you will not be asked to speak about them ** Listen to anything that could be relevant for the WMCS team: *** Take notes during the meeting and add them to the WMCS etherpad under "Report from SRE meeting" * '''Thursday''' (at the end of the clinic duty shift): Facilitate the Weekly WMCS team meeting === Phabricator === * Complete tasks under "Clinic Duty" on [[phab:project/board/2773/|Phabricator board]] * Help triage [[phab:project/board/2773/?filter=oOWzPnSD7U4f|new / incoming tasks]] on phabricator ==== Community Requests ==== Check for and respond to incoming requests. For new project requests or quota requests, please seek and obtain at least one other person's approval before approving and granting the request. Ensure this permission is explicitly documented on the phabricator ticket. For all floating IP requests, and any request you are unsure about, please bring up to the weekly meeting. Requests that represent an increase of more than double the quota or more than 300GB of storage should also be reviewed at the weekly meeting. *[https://phabricator.wikimedia.org/project/board/2875 VPS Project Requests] **Related docs: [[Portal:Cloud VPS/Admin/Projects lifecycle#Creating a new project]] **Related docs: [[Portal:Cloud VPS/Admin/Projects lifecycle#Deleting_a_project]] *[https://phabricator.wikimedia.org/project/board/4481/ DB Requests] **Related docs: [[Add_a_wiki#Cloud_Services]] *[https://phabricator.wikimedia.org/project/board/2880/ VPS Quota Requests ] **Related docs: [[Portal:Cloud_VPS/Admin/Projects_lifecycle#Modifying_project_quotas]] **Related docs: [[Portal:Cloud_VPS/Admin/Trove#Adjusting_per-project_Trove_quotas]] **Related docs: [[Portal:Cloud_VPS/Admin/OpenTofu]] (adding/removing flavors) *[https://phabricator.wikimedia.org/project/board/4834/ Toolforge Quota Requests ] **Related docs: [[Portal:Toolforge/Admin/Kubernetes#Quota management]] **Related docs: [[Portal:Toolforge/Admin/Harbor#Quota management]] (for Harbor specifically) **Related docs: [[Portal:Data_Services/Admin/Wiki_Replicas#User_connection_limits]] *[https://toolsadmin.wikimedia.org/tools/membership/?status=open Toolforge account requests] **Related docs: [[Portal:Toolforge/Admin#Users_and_community]] === IRC === * {{Irc|wikimedia-cloud}} monitoring ** Respond to help requests ** Watch for pings to other team members and intercept if appropriate ** Watch for pings to <code>!help</code> ** Call people out for poor behavior in the channel ** Praise people for helping constructively == Oncall duty == Oncall duty is a different shift than clinic duty, during your shift, you will be paged for critical issues, and you are expected to monitor and react to alerts, as well as highly prioritize working on tasks that improve the current alert/monitoring/stability of the platforms. See [[Wikimedia Cloud Services team/EnhancementProposals/Decision record T310598 Team oncall alerting schedules and processes|Decision Record]] for more information. * [https://portal.victorops.com/api/v1/org/wikimedia/team/team-ZXiOeQOoOEqFncON/calendar/67EF88CA9AB3C469AC1AC50F6ACBC0B4.ics ICS calendar link] * [https://portal.victorops.com/dash/wikimedia#/team/team-ZXiOeQOoOEqFncON/on-call-schedule Schedule page on VictorOps] ===Alerts=== *phaultfinder (This [[Alertmanager#Notifications|bot automatically opens tasks]] for non-paging alerts). Please ensure all open requests get assigned and worked (whether yourself or someone else). **[[phab:maniphest/query/dMuTeFTmEoib/#R|Open tasks]] **[[phab:maniphest/query/mfeBtXOYX.PH/#R|All tasks]] *[https://alerts.wikimedia.org/?q=team%3Dwmcs&q=%40state%3Dactive alertmanager (team=wmcs)] **You might also find [https://logstash.wikimedia.org/goto/d5c0cb63bb13bc883685c3d90d87cfb7 this dashboard] useful to browse alert history *[https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?hostgroup=wmcs_eqiad&style=overview Icinga for WMCS hardware]. *Watch for wmcs-related emails (cron, puppet failing on our projects, etc.) and fix ===Cloud VPS alerts=== These include things WMCS isn't directly responsible for. For this reason, most of these alerts aren't critical and aren't WMCS's problem to solve. However, projects for which WMCS is the owner / admin, like tools, admin, etc, are important and we should respond as the responsible party. *[https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive Alertmanager for Cloud VPS projects] ===Dashboards=== This list isn't exhaustive. Dashboards can be utilized to debug or confirm issues within WMCS platforms. *Platform Health **[https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1 NFS Storage Utilization] **[https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1 Openstack] **[https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&folder=current&tag=ceph&tag=health Ceph Cluster] **[https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?search=open&folder=current&orgId=1&refresh=1m WMCS Grafana Dashboards] **[https://grafana-labs.wikimedia.org/d/000000012/tools-basic-alerts tools trends] ==Improvements== If nothing currently requires immediate attention, you should work on improving tooling in this area. Consider: *Moving alerts from [https://icinga.wikimedia.org/ Icinga] to [https://alerts.wikimedia.org/ Alertmanager] (e.g. [https://gerrit.wikimedia.org/r/c/operations/puppet/+/813275 novafullstack], [https://gerrit.wikimedia.org/r/c/operations/puppet/+/813228 ceph]) *Adding new alerts or removing stale alerts (e.g. [https://gerrit.wikimedia.org/r/c/operations/alerts/+/822319 Adding neutron alert], [https://gerrit.wikimedia.org/r/c/operations/alerts/+/812706 Adding ceph alerts], [https://gerrit.wikimedia.org/r/c/operations/alerts/+/813274 Adding novafullstack alerts]) *Improving [[Portal:Cloud VPS/Admin/Runbooks|runbooks]] and documentation *Writing cookbooks to automate tasks (e.g. [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/774385 remove grid errors], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/801785 remove grid node], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/810914 ceph_reboot], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/806429 increase quotas] ) *Cleaning up puppet code, add tests *Improve/fix/upgrade the dashboards for the team in [https://grafana-rw.wikimedia.org/d/-K8NgsUnz/home?orgId=1&search=open&tag=WMCS grafana]. 7bvhxuf46nx7wab1935z5szvydg8az8 2243248 2243246 2024-11-11T10:58:39Z FNegri-WMF 32595 /* Phabricator */ fix broken link (the phab: handle doesn't encode correctly ? characters) 2243248 wikitext text/x-wiki The WMCS team practices a '''clinic duty''' rotation that runs from one weekly team meeting to the next. Each team member takes a turn sequentially performing these duties. Clinic Duty runs from one weekly meeting to the next. Your shift begins after the weekly meeting, and ends with the next. In a similar fashion, we have two '''oncall duty''' rotations, that also run for one week ([https://portal.victorops.com/api/v1/org/wikimedia/team/team-ZXiOeQOoOEqFncON/calendar/67EF88CA9AB3C469AC1AC50F6ACBC0B4.ics see the calendar]). == Clinic duty == === Start of clinic duty === * Archive the weekly meeting etherpad: ** copy the entire content of the etherpad to https://office.wikimedia.org/wiki/Wikimedia_Cloud_Services/Meeting_Notes/YYYY-MM-DD ** Create next week's etherpad at https://etherpad.wikimedia.org/p/WMCS-YYYY-MM-DD ** Copy-paste the content of the old Etherpad to the new Etherpad ** In the new Etherpad: *** delete anything outdated, such as the "Round the table" updates from the previous meeting *** modify the dates in the archive links at the top *** update the "starts"/"ends" dates in the Clinic Duty section *** remove the color highlight by clicking the "eye" icon in the Etherpad toolbar (Clear Authorship Colors) ** In the old (archived) Etherpad: *** delete all the content except the first four lines with the links to the archive * Update the topic (title) of the IRC channel {{Irc|wikimedia-cloud-admin}}: ** You need to first get op status on the channel: <code>/msg chanserv op #wikimedia-cloud-admin</code> (If that gives an error, you might not have the necessary permissions on the channel. Ask for help! :) ) ** Edit the channel topic: *** if you are using IRCCloud, click on the "cog" icon on the top right, then on "Set topic..." *** you can also use <code>/topic New topic...</code> or <code>/msg chanserv topic #wikimedia-cloud-admin New topic...</code> ** The topic should point to the etherpad for the next meeting and indicate the nick of the person on clinic duty (usually it will be your nick if you're setting this message) *** Example topic: <code>https://etherpad.wikimedia.org/p/WMCS-XXXX-XX-XX | Channel is logged at https://wm-bot.wmcloud.org/logs/%23wikimedia-cloud-admin/ | ping cteam | clinic duty: <irc nick></code> ** Remove your op status when you're done: <code>/msg chanserv deop #wikimedia-cloud-admin</code> === Meetings === * '''Monday''': attend the "SRE Staff Meeting" (if scheduled for that week) ** Report any WMCS updates that might be of interest to the wider SRE group: *** Copy the "outgoing updates" from the WMCS Etherpad to the "SRE Staff Meeting" document (linked in the calendar event) *** Feel free to add more items, or ask in IRC if you need suggestions *** If you want to speak during the SRE meeting, put one or more items in '''bold''', otherwise people will still see the items, but you will not be asked to speak about them ** Listen to anything that could be relevant for the WMCS team: *** Take notes during the meeting and add them to the WMCS etherpad under "Report from SRE meeting" * '''Thursday''' (at the end of the clinic duty shift): Facilitate the Weekly WMCS team meeting === Phabricator === * Complete tasks under "Clinic Duty" on [[phab:project/board/2773/|Phabricator board]] * Help triage [https://phabricator.wikimedia.org/project/board/2773/?filter=oOWzPnSD7U4f new / incoming tasks] on phabricator ==== Community Requests ==== Check for and respond to incoming requests. For new project requests or quota requests, please seek and obtain at least one other person's approval before approving and granting the request. Ensure this permission is explicitly documented on the phabricator ticket. For all floating IP requests, and any request you are unsure about, please bring up to the weekly meeting. Requests that represent an increase of more than double the quota or more than 300GB of storage should also be reviewed at the weekly meeting. *[https://phabricator.wikimedia.org/project/board/2875 VPS Project Requests] **Related docs: [[Portal:Cloud VPS/Admin/Projects lifecycle#Creating a new project]] **Related docs: [[Portal:Cloud VPS/Admin/Projects lifecycle#Deleting_a_project]] *[https://phabricator.wikimedia.org/project/board/4481/ DB Requests] **Related docs: [[Add_a_wiki#Cloud_Services]] *[https://phabricator.wikimedia.org/project/board/2880/ VPS Quota Requests ] **Related docs: [[Portal:Cloud_VPS/Admin/Projects_lifecycle#Modifying_project_quotas]] **Related docs: [[Portal:Cloud_VPS/Admin/Trove#Adjusting_per-project_Trove_quotas]] **Related docs: [[Portal:Cloud_VPS/Admin/OpenTofu]] (adding/removing flavors) *[https://phabricator.wikimedia.org/project/board/4834/ Toolforge Quota Requests ] **Related docs: [[Portal:Toolforge/Admin/Kubernetes#Quota management]] **Related docs: [[Portal:Toolforge/Admin/Harbor#Quota management]] (for Harbor specifically) **Related docs: [[Portal:Data_Services/Admin/Wiki_Replicas#User_connection_limits]] *[https://toolsadmin.wikimedia.org/tools/membership/?status=open Toolforge account requests] **Related docs: [[Portal:Toolforge/Admin#Users_and_community]] === IRC === * {{Irc|wikimedia-cloud}} monitoring ** Respond to help requests ** Watch for pings to other team members and intercept if appropriate ** Watch for pings to <code>!help</code> ** Call people out for poor behavior in the channel ** Praise people for helping constructively == Oncall duty == Oncall duty is a different shift than clinic duty, during your shift, you will be paged for critical issues, and you are expected to monitor and react to alerts, as well as highly prioritize working on tasks that improve the current alert/monitoring/stability of the platforms. See [[Wikimedia Cloud Services team/EnhancementProposals/Decision record T310598 Team oncall alerting schedules and processes|Decision Record]] for more information. * [https://portal.victorops.com/api/v1/org/wikimedia/team/team-ZXiOeQOoOEqFncON/calendar/67EF88CA9AB3C469AC1AC50F6ACBC0B4.ics ICS calendar link] * [https://portal.victorops.com/dash/wikimedia#/team/team-ZXiOeQOoOEqFncON/on-call-schedule Schedule page on VictorOps] ===Alerts=== *phaultfinder (This [[Alertmanager#Notifications|bot automatically opens tasks]] for non-paging alerts). Please ensure all open requests get assigned and worked (whether yourself or someone else). **[[phab:maniphest/query/dMuTeFTmEoib/#R|Open tasks]] **[[phab:maniphest/query/mfeBtXOYX.PH/#R|All tasks]] *[https://alerts.wikimedia.org/?q=team%3Dwmcs&q=%40state%3Dactive alertmanager (team=wmcs)] **You might also find [https://logstash.wikimedia.org/goto/d5c0cb63bb13bc883685c3d90d87cfb7 this dashboard] useful to browse alert history *[https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?hostgroup=wmcs_eqiad&style=overview Icinga for WMCS hardware]. *Watch for wmcs-related emails (cron, puppet failing on our projects, etc.) and fix ===Cloud VPS alerts=== These include things WMCS isn't directly responsible for. For this reason, most of these alerts aren't critical and aren't WMCS's problem to solve. However, projects for which WMCS is the owner / admin, like tools, admin, etc, are important and we should respond as the responsible party. *[https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive Alertmanager for Cloud VPS projects] ===Dashboards=== This list isn't exhaustive. Dashboards can be utilized to debug or confirm issues within WMCS platforms. *Platform Health **[https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1 NFS Storage Utilization] **[https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1 Openstack] **[https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&folder=current&tag=ceph&tag=health Ceph Cluster] **[https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?search=open&folder=current&orgId=1&refresh=1m WMCS Grafana Dashboards] **[https://grafana-labs.wikimedia.org/d/000000012/tools-basic-alerts tools trends] ==Improvements== If nothing currently requires immediate attention, you should work on improving tooling in this area. Consider: *Moving alerts from [https://icinga.wikimedia.org/ Icinga] to [https://alerts.wikimedia.org/ Alertmanager] (e.g. [https://gerrit.wikimedia.org/r/c/operations/puppet/+/813275 novafullstack], [https://gerrit.wikimedia.org/r/c/operations/puppet/+/813228 ceph]) *Adding new alerts or removing stale alerts (e.g. [https://gerrit.wikimedia.org/r/c/operations/alerts/+/822319 Adding neutron alert], [https://gerrit.wikimedia.org/r/c/operations/alerts/+/812706 Adding ceph alerts], [https://gerrit.wikimedia.org/r/c/operations/alerts/+/813274 Adding novafullstack alerts]) *Improving [[Portal:Cloud VPS/Admin/Runbooks|runbooks]] and documentation *Writing cookbooks to automate tasks (e.g. [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/774385 remove grid errors], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/801785 remove grid node], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/810914 ceph_reboot], [https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/806429 increase quotas] ) *Cleaning up puppet code, add tests *Improve/fix/upgrade the dashboards for the team in [https://grafana-rw.wikimedia.org/d/-K8NgsUnz/home?orgId=1&search=open&tag=WMCS grafana]. ocq5o9q260ilg8d3tqzzr3sy1s5aimv Wikitech:Administrators 4 441136 2243222 2242491 2024-11-11T08:16:21Z Revi C. 1919 Historical contentadmin 2243222 wikitext text/x-wiki An '''Administrator''' is an editor who has the extra ability to: *Block and unblock other users *Delete and undelete pages on the site *Protect and unprotect pages *Edit pages in the MediaWiki namespace (e.g. [[MediaWiki:Sidebar|MediaWiki:Sidebar]]) *View deleted edits of a page *View any contributions by a user to pages which are deleted. [[:m:Special:MyLanguage/Stewards|Stewards]] and [[:m:Special:MyLanguage/global sysops|global sysops]] should feel free to deal with vandalism and similarly routine and uncontroversial things, if necessary (Note: These are global groups defined by the [[:mw:Special:MyLanguage/Extension:CentralAuth|CentralAuth extension]] and don't appear at [[Special:ListUsers]], but at [[Special:GlobalUsers]]). Before the [[Wikitech/SUL-migration|SUL]] migration of wikitech, there was [[Wikitech:Content administrators|contentadmin]] which was identical to this group. It existed because wikitech had special LDAP connection controlled by administrators. During the migration, LDAP control was unbundled from admins, and contentadmin group was merged into the administrator group. == Appointment == Requests for administrator permissions are handled [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Requesting+%3Crights%3E+access+for+%3Cyour-wikitech-username%3E&owner=&description=*+Wikitech+username%3A%0D%0A*+Your+reasons+for+requesting+this+permission(s)%3A%0D%0A&projects=wikitech in Phabricator using this task template]. == See also == * [[Special:ListUsers/sysop|List of users with administrator permissions]] * [[Wikitech:Bureaucrats]] [[Category:User groups]] 3a9f1cuz6qplzx9qr456dofi0e9g8oc 2243223 2243222 2024-11-11T08:17:33Z Revi C. 1919 Restrictions of contentadmin 2243223 wikitext text/x-wiki An '''Administrator''' is an editor who has the extra ability to: *Block and unblock other users *Delete and undelete pages on the site *Protect and unprotect pages *Edit pages in the MediaWiki namespace (e.g. [[MediaWiki:Sidebar|MediaWiki:Sidebar]]) *View deleted edits of a page *View any contributions by a user to pages which are deleted. [[:m:Special:MyLanguage/Stewards|Stewards]] and [[:m:Special:MyLanguage/global sysops|global sysops]] should feel free to deal with vandalism and similarly routine and uncontroversial things, if necessary (Note: These are global groups defined by the [[:mw:Special:MyLanguage/Extension:CentralAuth|CentralAuth extension]] and don't appear at [[Special:ListUsers]], but at [[Special:GlobalUsers]]). Before the [[Wikitech/SUL-migration|SUL]] migration of wikitech, there was [[Wikitech:Content administrators|contentadmin]] which was identical to this group (except they lacked <code>editinterface</code>, preventing edits to MediaWiki namespaces). It existed because wikitech had special LDAP connection controlled by administrators. During the migration, LDAP control was unbundled from admins, and contentadmin group was merged into the administrator group. == Appointment == Requests for administrator permissions are handled [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Requesting+%3Crights%3E+access+for+%3Cyour-wikitech-username%3E&owner=&description=*+Wikitech+username%3A%0D%0A*+Your+reasons+for+requesting+this+permission(s)%3A%0D%0A&projects=wikitech in Phabricator using this task template]. == See also == * [[Special:ListUsers/sysop|List of users with administrator permissions]] * [[Wikitech:Bureaucrats]] [[Category:User groups]] izqbpsa5ezncok3ga31to6fsnzmp90p Performance/Alerts 0 441827 2243215 2242706 2024-11-11T05:24:41Z Phedenskog 5986 Fix links and some email info. 2243215 wikitext text/x-wiki == Alerts in Grafana == We use Grafana for alerting on performance regressions. We collect performance metrics from our real users and store it in Graphite and Prometheus. We also collect performance metrics from synthetic performance tools in Graphite. We use both types of metrics and send alerts on those. === History === When we started out we only used [[Navigation Timing|RUM]] to find regressions. Back then (and now) we use https://github.com/wikimedia/mediawiki-extensions-NavigationTiming to collect the data. We collect metrics from a small portion of the users and pass on the metrics to our servers that later ends up https://graphiteapp.org/ and [https://prometheus.io Prometheus]. We collect Navigation Timing, a couple of User Timings, first paint, First Contentful Paint and Largest contentful Paint and CPU long tasks for browsers that supports that. ===== Finding regressions ===== The way we found regressions was to closely look at the graphs in Graphite/Grafana. Yep watching them real close. The best way for us is to compare current metrics with the metrics we had one week back in time. The traffic and usage pattern for Wikipedia is almost the same if we compare 7 days. Comparing 24 hours back in time can also work, depending on when you look (weekend traffic is different). Did we find any regressions? Yes we did. This is what one looked like for us: [[File:First-paint-2016-05-05.png|First paint change found on Graphite GUI|center]] Looks good right, we could actually see that we have a regression on first paint. What is kind of cool is that the human eye is pretty good at spotting differences between two lines. But we moved on to use [http://docs.grafana.org/alerting/rules alerts in Grafana] to automate how we find them. === Current setup === At the moment we alert on [https://grafana.wikimedia.org/dashboards/f/KROjKjAZz/performance-team-alerts WebPageReplay], direct synthetic tests, [https://grafana.wikimedia.org/dashboard/db/navigation-timing-alerts Navigation Timing] metrics, increase in CSS an JavaScript size, [https://grafana.wikimedia.org/dashboard/db/save-timing-alerts Save timings] and if synthetic tools are down/not working. ==== Alerts ==== We have alerts for RUM and synthetic testing. We have different ways of alerting depending on the tool and how we collect the data. ===== WebPageReplay ===== We run tests with alerts on enwiki, group 1 (it) and group 0. The idea is to see performance regressions early. There are three dashboards today for these alerts. The tests test Wikipedia on desktop and using emulated mobile mode for testing m.wikipedia (the plan is to move those tests to real Android devices instead): * [https://grafana.wikimedia.org/d/2kP3FjAZk/synthetic-performance-en-wikipedia-org-alerts-using-webpagereplay%3ForgId=1 enwiki] * [https://grafana.wikimedia.org/d/2kP3FjAXX/synthetic-performance-group-0-www-mediawiki-org-alerts-using-webpagereplay%3ForgId=1 Group 0] * [https://grafana.wikimedia.org/d/2kP3FjAZE/synthetic-performance-group-1-it-wikipedia-org-alerts-using-webpagereplay%3ForgId=1 Group 1] These alerts works like this: Every Sunday we take baseline. Through out the coming week, we test our metrics against that baseline using Mann Whitney U and Cliffs delta. Mann Whitney U is a statistical test that can find difference in metrics even if the data is not following normal distribution (performance tests often has outliers etc). If a performance regression is pushed, and Mann Whitney U signal a statistical change, the alert is fired. Cliffs delta measures the difference between two groups of metrics (baseline and our test) and categorise the change as small, medium or high change. That helps us to make sure we do not fire on small alerts, today our alerts fire on medium changes. . The next sunday, we will take a new baseline. You can trigger new baselines either by pushing a configuration change to git, let all tests create a new baseline, and push another change to git. Or manual configured new baselines on the server that runs the tests. For enwiki on desktop we test eight URLs, on en.m three URL , group 1 five URLs. Depending on the content on the page it can be easier/harder to see regressions so running tests on many pages are good and we are sure that we find things regressions that affects many users. Our WebPageReplay alert setup is explained more in [[Performance/Guides/WebPageReplay alert]]. ===== Direct synthetic tests ===== We also run tests direct against Wikipedia. These test tests three URLS and look at the median. The alerts are setup to fire alerts if the median increases by X ms. The threshold is a hard limit. This means if the alert fires, it will continue to fire until the metric is below the threshold or you change the alert. We alert on changes on enwiki and en.m.wiki using direct tests. For an alert to fire, all three URLs need to be above the threshold for that URL. Looking at medians are used to be our default way BUT we could have regressions that are not statistically significant and we can fire false alerts (we actually do/did). Metrics for a URL can change over time of the content change on the page, and the limit is set hard when we created the alerts. That means every alert that is fired, you need to look at the metric and see if it affected all URL and is a valid alert. You also manually need to set a new hard limit if the pages has a new baseline. The alerts for direct tests is found at [https://grafana.wikimedia.org/d/000000318/performance-synthetic-direct-test-alerts?orgId=1&refresh=5m https://grafana.wikimedia.org/d/000000318/performance-synthetic-direct-test-alerts] ===== Navigation Timing ===== Real user alerts have a couple of different setups. Either we compare back in time to the metrics value 7 days ago and alert if the metrics has increased by X ms. For the metrics that exits in Prometheus we alert if % of users that are in a specific metric bucket increased (for example we have 1% more users that have a first paint slower than compared to last week). We alert on our RUM metrics. We alert on first paint, TTFB and loadEventEnd. We set the alerts on p75 and p95 of the metrics we collect and alert on a 5-30% change depending on the metrics. Some metrics are really unstable and some are better. You can see our RUM alerts at https://grafana.wikimedia.org/d/000000326/navigation-timing-alerts Then there are alerts that use data stored in Prometheus. These alerts haven't been really tested by any team since they where done when the performance team was closed down. These alerts look at buckets and amount of users in % in those buckets. For example we alert on if we have  an increase in users with a first paint slower than  1 second compared to last week. Before Graphite is closed down (https://phabricator.wikimedia.org/T228380) we need to look at the current Prometheus alerts, and more alerts and make sure we alert on the things we think is necessary so that we don't miss out on anything with the sunsetting of Graphite. ===== Synthetic tools status alerts ===== We alert if our tools are down or are not working as expected. If an alerts fires, please go to https://grafana.wikimedia.org/d/frWAt6PMz/performance-synthetic-tool-alerts and see what's failing. ===== Save Timings ===== // TODO ===== JavaScript and CSS increase ===== We alert on increased JavaScript and CSS sizes. We test three URLs and have hard limits on when to alerts. These limits are set individually per URL and if all three URLs are above the limit, the alert will fire. We have the alerts to be able to find generic increases that affects all URLs. That means we need to tune the alerts if the size for one of the pages increases. Increased JavaScript sizes will also increase the number of CPU long tasks. For increased CSS it can potentially increase the first paint/first visual change and related metrics. We only run these tests on en.wiki and [https://grafana.wikimedia.org/d/2kP3FjAZk/synthetic-performance-en-wikipedia-org-alerts-using-webpagereplay?orgId=1 this] is the dashboard. ==== Alert emails ==== The alerts in Grafana will trigger an email. At the moment the most synthetic tests alerts (except the ones from the web teams dashboard) is sent to the team qte (as setup in Grafana). ==== Known problems ==== There's a couple of problems we have seen so far. ===== Self healing alerts ===== For some tests we go back X days back (usually 7 days back). That means that after 7 days, the alert is self healing (we will then compare with the metric that set off the alert). ===== Known non working queries ===== We have had [https://github.com/grafana/grafana/issues/9917 problems with nested queries] that works in the beginning but then stopped working (using Graphite built in percentage queries). To avoid that we now do alert queries like this: Create one query that goes back X days and make that hidden. Then make another query that divides with the first one and set the offset to -1. It looks like this: [[File:Alerts query setup WebPageTest.png|center|thumb|600x600px|Setup alert queries for WebPageTest example.]] === Creating an alert === In 2021 we moved to use [https://alerts.wikimedia.org AlertManager] in Grafana. To setup an alert for AlertManager you need to make sure you add AlertManager in the send to field in the alerts. [[File:Choosing Alert Manager in Grafana.png|center|thumb|Choose AlertManager when creating an alert.]] You also need to add tags to your alerts. To make sure the alerts reach the performance team you need to add two tags: '''team''' and '''severity'''. The '''team''' tag needs to have the value '''perf''' and the '''severity''' '''t'''ag should have the value '''critical''' or '''warning''' (depending on the severity of the alert). We also use tags to add extra info about the alerts. The following tags are used at the moment but feel free to add more: * '''metric''' - the value shows what metric that fired the alert. By tagging the metric, its easier to find other alerts that also fired at the same time * '''tool''' - what tool that fired the alert. Values can be '''rum''' (real user measurements), '''webpagetest, webpagereplay, sitespeed.io.''' * '''dashboard''' - the link to the current alert dashboard. At the moment Grafana/AlertManager aren't best friends so you need to give the link to the dashboard so you can find the actual dashboard directly from the alert. This is '''important''', please add the link. * '''drilldown''' - link to other dashboard that hold more info about the metric that fired. For example a dashboard that shows all metric collected for a URL that failed or RUM metrics that give more insights to what could be wrong. In the future we also want to include a link to the current runbook for each alert. The tags will look something like this when you are done: [[File:Show adding tags in Grafana of AlertManager.png|center|thumb|Adding tags to the alert.]]When you create your alert you need to make sure that it's evaluated often so that the AlertManager understands that it is the same alert that fires. At the moment we use evaluate every 3 minutes. If you don't evaluate often, alerts will fire and you will give multiple emails for the same alert. lwu7yp98ggbtle9xdv6xf1idv1deo37 Tool:Phab-ban/Log 116 453426 2243216 2239941 2024-11-11T07:00:48Z Phabbanbot 37210 Impactolog was disabled by revi 2243216 wikitext text/x-wiki <noinclude>'''Audit log of bans''' made via https://phab-ban.toolforge.org. Some bans made prior to 2023-09-01 were manually logged at [[phab:T200856]]. __NOTOC__</noinclude> === 2024-11-11 === * 07:00 [[phab:p/Impactolog|Impactolog]] was disabled by [[phab:p/revi/|revi]] === 2024-10-30 === * 09:05 [[phab:p/Jweighed1|Jweighed1]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-10-25 === * 04:20 [[phab:p/Blunt2531|Blunt2531]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-10-08 === * 08:31 [[phab:p/Surfcityrecovery|Surfcityrecovery]] was disabled by [[phab:p/MoritzMuehlenhoff/|MoritzMuehlenhoff]] === 2024-10-01 === * 21:49 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] === 2024-09-27 === * 10:20 [[phab:p/SorBP|SorBP]] was disabled by [[phab:p/TheresNoTime/|TheresNoTime]] === 2024-09-08 === * 10:45 [[phab:p/Robin_Mathew_Rajan|Robin_Mathew_Rajan]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-09-02 === * 17:58 [[phab:p/Idxntcx|Idxntcx]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-09-01 === * 10:11 [[phab:p/LDAP|LDAP]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-07-22 === * 08:56 [[phab:p/Nobleadele|Nobleadele]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-06-18 === * 20:54 [[phab:p/Playgiirlkaybrazy|Playgiirlkaybrazy]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-06-08 === * 22:30 [[phab:p/Exposingsesion1|Exposingsesion1]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-05-27 === * 12:24 [[phab:p/JosefineHellrothLarssonWMSE|JosefineHellrothLarssonWMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 07:54 [[phab:p/SMMpanels|SMMpanels]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-05-22 === * 11:31 [[phab:p/Sandra_Fauconnier_WMSE|Sandra_Fauconnier_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:23 [[phab:p/MiaJacobssonWMSE|MiaJacobssonWMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/David_Haskiya_WMSE|David_Haskiya_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/kalle|kalle]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/Tore_Danielsson_WMSE|Tore_Danielsson_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/Gitta|Gitta]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/annatroberg|annatroberg]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/AxelPettersson_WMSE|AxelPettersson_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:17 [[phab:p/SaraMortsell|SaraMortsell]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] === 2024-05-13 === * 14:55 [[phab:p/BenoitPrieur|BenoitPrieur]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2024-05-04 === * 19:54 [[phab:p/Sammoon391|Sammoon391]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-05-01 === * 07:24 [[phab:p/Soubag|Soubag]] was disabled by [[phab:p/Mainframe98/|Mainframe98]] === 2024-04-28 === * 06:22 [[phab:p/Diamondscoin|Diamondscoin]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-04-19 === * 03:47 [[phab:p/Wawmart2|Wawmart2]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-04-08 === * 10:21 [[phab:p/Mardetanha|Mardetanha]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2024-03-29 === * 09:03 [[phab:p/Abdollmjjedloveanan|Abdollmjjedloveanan]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-03-12 === * 09:45 [[phab:p/Samantha78462|Samantha78462]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:39 [[phab:p/Samantha7861654654|Samantha7861654654]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:11 [[phab:p/Robin|Robin]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:05 [[phab:p/Anglinakuki|Anglinakuki]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 00:02 [[phab:p/Johnne25|Johnne25]] was disabled by [[phab:p/bd808/|bd808]] === 2024-03-07 === * 20:07 [[phab:p/Sami785|Sami785]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-03-06 === * 07:41 [[phab:p/28q|28q]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-03-02 === * 22:51 [[phab:p/kitchenstrategic|kitchenstrategic]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-02-23 === * 09:03 [[phab:p/littleggghost|littleggghost]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-02-17 === * 05:10 [[phab:p/Skekeiei|Skekeiei]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-01-27 === * 03:50 [[phab:p/Andybitcoin|Andybitcoin]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-01-25 === * 09:10 [[phab:p/Mayo3030|Mayo3030]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-21 === * 05:59 [[phab:p/Hackear|Hackear]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 05:58 [[phab:p/joselopez45|joselopez45]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-20 === * 20:02 [[phab:p/08107130655|08107130655]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-18 === * 05:33 [[phab:p/Tecnologynew|Tecnologynew]] was disabled by [[phab:p/TheresNoTime/|TheresNoTime]] === 2024-01-12 === * 22:34 [[phab:p/cchen|cchen]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-01-11 === * 07:34 [[phab:p/Bernita43|Bernita43]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-01-07 === * 10:13 [[phab:p/Irademack|Irademack]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2023-12-31 === * 16:48 [[phab:p/Vieclamdmpt|Vieclamdmpt]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2023-12-26 === * 20:34 [[phab:p/Bgu5678|Bgu5678]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2023-11-26 === * 20:09 [[phab:p/Str13tlife|Str13tlife]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2023-11-24 === * 00:49 [[phab:p/Imambuchori03|Imambuchori03]] was disabled by [[phab:p/DannyS712/|DannyS712]] === 2023-11-22 === * 15:50 [[phab:p/Naleksuh|Naleksuh]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2023-11-19 === * 21:10 [[phab:p/Onack16888|Onack16888]] was disabled by [[phab:p/Daimona/|Daimona]] === 2023-11-15 === * 11:57 [[phab:p/Anonymous_ehacker|Anonymous_ehacker]] was disabled by [[phab:p/hashar/|hashar]] === 2023-11-09 === * 23:16 [[phab:p/dunicorn|dunicorn]] was disabled by [[phab:p/bd808/|bd808]] === 2023-09-30 === * 17:38 [[phab:p/Wykirany|Wykirany]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2023-09-01 === * 16:29 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] * 16:17 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] bypkmkbh7erabh0ppuwwf2801t1uf03 2243253 2243216 2024-11-11T11:09:34Z Phabbanbot 37210 Mvwservices was disabled by Peachey88 2243253 wikitext text/x-wiki <noinclude>'''Audit log of bans''' made via https://phab-ban.toolforge.org. Some bans made prior to 2023-09-01 were manually logged at [[phab:T200856]]. __NOTOC__</noinclude> === 2024-11-11 === * 11:09 [[phab:p/Mvwservices|Mvwservices]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 07:00 [[phab:p/Impactolog|Impactolog]] was disabled by [[phab:p/revi/|revi]] === 2024-10-30 === * 09:05 [[phab:p/Jweighed1|Jweighed1]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-10-25 === * 04:20 [[phab:p/Blunt2531|Blunt2531]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-10-08 === * 08:31 [[phab:p/Surfcityrecovery|Surfcityrecovery]] was disabled by [[phab:p/MoritzMuehlenhoff/|MoritzMuehlenhoff]] === 2024-10-01 === * 21:49 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] === 2024-09-27 === * 10:20 [[phab:p/SorBP|SorBP]] was disabled by [[phab:p/TheresNoTime/|TheresNoTime]] === 2024-09-08 === * 10:45 [[phab:p/Robin_Mathew_Rajan|Robin_Mathew_Rajan]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-09-02 === * 17:58 [[phab:p/Idxntcx|Idxntcx]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-09-01 === * 10:11 [[phab:p/LDAP|LDAP]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-07-22 === * 08:56 [[phab:p/Nobleadele|Nobleadele]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-06-18 === * 20:54 [[phab:p/Playgiirlkaybrazy|Playgiirlkaybrazy]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-06-08 === * 22:30 [[phab:p/Exposingsesion1|Exposingsesion1]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-05-27 === * 12:24 [[phab:p/JosefineHellrothLarssonWMSE|JosefineHellrothLarssonWMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 07:54 [[phab:p/SMMpanels|SMMpanels]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-05-22 === * 11:31 [[phab:p/Sandra_Fauconnier_WMSE|Sandra_Fauconnier_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:23 [[phab:p/MiaJacobssonWMSE|MiaJacobssonWMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/David_Haskiya_WMSE|David_Haskiya_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/kalle|kalle]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:20 [[phab:p/Tore_Danielsson_WMSE|Tore_Danielsson_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/Gitta|Gitta]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/annatroberg|annatroberg]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:19 [[phab:p/AxelPettersson_WMSE|AxelPettersson_WMSE]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] * 10:17 [[phab:p/SaraMortsell|SaraMortsell]] was disabled by [[phab:p/Sebastian_Berlin-WMSE/|Sebastian_Berlin-WMSE]] === 2024-05-13 === * 14:55 [[phab:p/BenoitPrieur|BenoitPrieur]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2024-05-04 === * 19:54 [[phab:p/Sammoon391|Sammoon391]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-05-01 === * 07:24 [[phab:p/Soubag|Soubag]] was disabled by [[phab:p/Mainframe98/|Mainframe98]] === 2024-04-28 === * 06:22 [[phab:p/Diamondscoin|Diamondscoin]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-04-19 === * 03:47 [[phab:p/Wawmart2|Wawmart2]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-04-08 === * 10:21 [[phab:p/Mardetanha|Mardetanha]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2024-03-29 === * 09:03 [[phab:p/Abdollmjjedloveanan|Abdollmjjedloveanan]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-03-12 === * 09:45 [[phab:p/Samantha78462|Samantha78462]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:39 [[phab:p/Samantha7861654654|Samantha7861654654]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:11 [[phab:p/Robin|Robin]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 09:05 [[phab:p/Anglinakuki|Anglinakuki]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 00:02 [[phab:p/Johnne25|Johnne25]] was disabled by [[phab:p/bd808/|bd808]] === 2024-03-07 === * 20:07 [[phab:p/Sami785|Sami785]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-03-06 === * 07:41 [[phab:p/28q|28q]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-03-02 === * 22:51 [[phab:p/kitchenstrategic|kitchenstrategic]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-02-23 === * 09:03 [[phab:p/littleggghost|littleggghost]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-02-17 === * 05:10 [[phab:p/Skekeiei|Skekeiei]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-01-27 === * 03:50 [[phab:p/Andybitcoin|Andybitcoin]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2024-01-25 === * 09:10 [[phab:p/Mayo3030|Mayo3030]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-21 === * 05:59 [[phab:p/Hackear|Hackear]] was disabled by [[phab:p/Peachey88/|Peachey88]] * 05:58 [[phab:p/joselopez45|joselopez45]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-20 === * 20:02 [[phab:p/08107130655|08107130655]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2024-01-18 === * 05:33 [[phab:p/Tecnologynew|Tecnologynew]] was disabled by [[phab:p/TheresNoTime/|TheresNoTime]] === 2024-01-12 === * 22:34 [[phab:p/cchen|cchen]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-01-11 === * 07:34 [[phab:p/Bernita43|Bernita43]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2024-01-07 === * 10:13 [[phab:p/Irademack|Irademack]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2023-12-31 === * 16:48 [[phab:p/Vieclamdmpt|Vieclamdmpt]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2023-12-26 === * 20:34 [[phab:p/Bgu5678|Bgu5678]] was disabled by [[phab:p/Peachey88/|Peachey88]] === 2023-11-26 === * 20:09 [[phab:p/Str13tlife|Str13tlife]] was disabled by [[phab:p/JJMC89/|JJMC89]] === 2023-11-24 === * 00:49 [[phab:p/Imambuchori03|Imambuchori03]] was disabled by [[phab:p/DannyS712/|DannyS712]] === 2023-11-22 === * 15:50 [[phab:p/Naleksuh|Naleksuh]] was disabled by [[phab:p/WMFOffice/|WMFOffice]] === 2023-11-19 === * 21:10 [[phab:p/Onack16888|Onack16888]] was disabled by [[phab:p/Daimona/|Daimona]] === 2023-11-15 === * 11:57 [[phab:p/Anonymous_ehacker|Anonymous_ehacker]] was disabled by [[phab:p/hashar/|hashar]] === 2023-11-09 === * 23:16 [[phab:p/dunicorn|dunicorn]] was disabled by [[phab:p/bd808/|bd808]] === 2023-09-30 === * 17:38 [[phab:p/Wykirany|Wykirany]] was disabled by [[phab:p/RhinosF1/|RhinosF1]] === 2023-09-01 === * 16:29 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] * 16:17 [[phab:p/T200856-01|T200856-01]] was disabled by [[phab:p/bd808/|bd808]] dcujkkeobut6dge5hwwd79b6bs30k33 Help:Project puppetserver 12 454293 2243201 2237090 2024-11-10T17:11:32Z Physikerwelt 244 /* Step 1: Setup a puppetserver */ update docu to generation g4 flavours 2243201 wikitext text/x-wiki {{Cloud VPS nav}} This page contains information about project puppetservers, helps you decide why you would use one, and provides basic trouble shooting information. Prior to Debian Bookworm, similar workflows were accomplished via the <code>role::puppetmaster::standalone</code> puppet role. This process is now deprecated, but the documentation can be found at [[Obsolete:Standalone_puppetmaster]]. == About project puppetservers == A '''project puppetserver''' can be used to deliver either a custom or experimental [[Puppet]] configuration to one or more hosts in a Cloud VPS project. All Cloud VPS instances run puppet via cron about every 30 minutes. The default configuration pulls from the shared Cloud VPS puppetserver which has a pristine checkout of <code>operations/puppet.git</code> and <code>labs/private.git</code>. Sometimes you want to have a puppetserver that you control as a Cloud VPS user. You can use <code>role::puppetserver::cloud_vps_project</code> to accomplish this. == Why would I want a project puppetserver? == These are the popular reasons for wanting to use one: # Testing puppet patches before they have been merged in <code>operations/puppet.git</code> # Keep project local secrets in a <code>labs/private.git</code> repo {{Note|content=As of puppet version 7, naming has changed from 'puppetmaster' to 'puppetserver'. The names are still used interchangeably in several places but typically 'puppetmaster' (<code>role::puppetmaster::standalone</code>) refers to a version 5 server and 'puppetserver' (<code>role::puppetserver::cloud_vps_project</code>) to a version 7 server.}} == How can I use a project puppetserver? == === Step 1: Setup a puppetserver === # Create a Debian Bookworm instance with at least 2 GB of RAM (g4.cores1.ram2.disk20 to support a handful of VMs, or a larger size for a bigger project) # (optional) create and mount a 5GB [[cinder]] volume on /srv. That will allow you to transfer certs and puppet code to future upgraded puppetservers. # Apply the role <code>role::puppetserver::cloud_vps_project</code> to the new VM # Apply the following hiera settings: #:<code>profile::puppet::agent::force_puppet7: true</code> #:<code>puppetmaster: puppet</code> # Force two consecutive puppet runs on the instance using <code>sudo -i run-puppet-agent</code> # Generate the server certs: <code>sudo puppetserver ca generate --certname $(hostname -f)</code> # Make sure puppetserver can restart: <code>sudo systemctl restart puppetserver</code> You will now find a checkout of operations/puppet.git in <code>/srv/git/operations/puppet</code> and a checkout of labs/private.git in <code>/srv/git/labs/private</code>. {{Note|content=A project puppetserver is '''not''' its own client by default; to configure this, see below.}} ==== Autosigning ==== If you are not planning on keeping '''any''' secrets in your puppetserver, you can turn on autosigning. This will automatically accept new clients as they contact the server, and save you a step later on when adding clients. You can do that by adding the following to the hiera configuration for the puppetserver instance: <code>profile::puppetserver::autosign: true</code> {{Note|content=If you are keeping secrets, you must not turn this on! If you do, anyone with network access to the puppetmaster can access all your secrets!|type=warning}} === Step 2: Setup a puppet client === This works for any number of instances, including the puppetserver itself. # Using the [[Horizon]] interface: ## Set the [[hiera]] variable <code>puppetmaster: <your puppetserver's FQDN></code>. (Scroll all the way to the bottom of the Puppet Configuration tab, to the Hiera Config button, and add the variable name/value in YAML format.) Use the fully qualified domain name of the puppetserver (i.e. <code>hostname.projectname.eqiad1.wikimedia.cloud</code> which can be found by running <code>hostname -f</code> on the puppetserver host). ##* This variable can be set either per-instance or project wide. # On the client instance: ## Run puppet to apply the modified hiera configuration: <code>sudo -i run-puppet-agent</code> ## Remove existing SSL certificates created with the prior puppetmaster: <code>sudo rm -rf /var/lib/puppet/ssl</code> (NOTE: you will see <code>certificate verify failed: [self signed certificate in certificate chain ...]</code> when you run puppet if you don't do this. ## '''Only''' if the client is the puppetserver itself, run also: ##:<syntaxhighlight lang="bash">sudo rm /srv/puppet/server/ssl/ca/signed/$(hostname -f).pem sudo mkdir -p /var/lib/puppet/ssl/private_keys sudo cp -v /srv/puppet/{server/,}ssl/private_keys/$(hostname -f).pem sudo mkdir /var/lib/puppet/ssl/certs/ sudo cp -v /srv/puppet/{server/,}ssl/certs/$(hostname -f).pem</syntaxhighlight> ## Force a puppet run to generate a new local SSL certificate: <code>sudo -i run-puppet-agent</code> # On the puppetserver (If you enabled autosigning in the master, you can skip this step): ## Check that the fingerprint of the client's key matches the one shown when it was generated running <code>sudo -i puppet cert list --all</code> (Puppet 5) or <code>sudo -i puppetserver ca list --all</code> (Puppet 7) ## Sign the key of the client: <code>sudo -i puppet cert sign <client-fqdn></code> (Puppet 5) or <code>puppetserver ca sign --certname <client-fqdn></code> (Puppet 7). # On the client: ## Run puppet twice to check that is connecting correctly to its new server and that the second run does not produce any new configuration changes: <code>sudo -i run-puppet-agent</code> === Deploying changes === Changes made in /srv/git/operations/puppet or /srv/git/labs/private will not immediately be applied by the puppetserver; first these changes must be deployed. Deploy by running <code>sudo puppetserver-deploy-code</code>. <code>puppetserver-deploy-code</code> will only deploy the '''production''' branch in <code>/operations/puppet</code> and only deploy the '''master''' branch in <code>/labs/private</code>. In most cases, git operations in /srv/git will trigger a hook that automatically deploys code. But, when in doubt, manually redeploy to be sure that you're applying the latest production branch. === Troubleshooting === ==== Forgetting to run <code>puppet agent</code> with sudo ==== Failing to run <code>puppet agent</code> as root can result in the surprising error: <pre> Debug: Creating new connection for https://puppet:8140 Error: Could not request certificate: getaddrinfo: Name or service not known Exiting; failed to retrieve certificate and waitforcert is disabled </pre> ==== Interacting with the local Git clones ==== The clones of the Puppet repositories in <code>/srv/git</code> must be owned by and interacted via the the <code>gitpuppet</code> user. A helper script <code>pgit</code> is made available in $PATH. ==== Hiera values from local commits don't seem to apply! ==== Hiera in Cloud VPS has different settings than in Wikimedia Production puppet, and it can be tricky to get your values into the right place in <code>labs/private.git</code> or puppet in general. First, check <code>/etc/puppet/hiera.yaml</code> on your standalone puppetserver for guidance. You may just need to add things to <code>labs/private/hieradata/labs/<VPS project name>/common.yaml</code>. You can verify that a node can "see" your hiera settings with the command <code>sudo puppet lookup --node <fqdn of puppet client> --compile --explain <your::hiera::key></code> on the project puppetserver. The compile is needed to set all the right facts for hiera in Cloud VPS, and there will be a lot of warnings you can ignore. The answer will be at the end of the command's output. === Adding a secret === To add a new secret to your puppetserver to use in your manifests, you have to add a new local commit to the repository under <code>/srv/git/labs/private</code> in the puppetserver node. Note that there might be a leftover git hook preventing you to add a new commit with the message: {{Codesample | code = Local commits are not allowed in this repository. Please go to frontend puppetmaster and commit | lang = bash }} This is a leftover from the puppet migration to puppet 7 and can be ignored, to do so just <code>rm /srv/git/labs/private/.git/hooks/pre-commit</code>, and it will not bother you again. == Advanced use cases == === Using puppetdb === WARNING: This is is difficult and complicated. Proceed with caution. Puppetdb can be enabled on standalone puppetservers on Cloud VPS by designating a puppetdb server with the <code>role::puppetdb</code> role and a lot of hiera values on both the DB server and the puppetmaster. This requires significant effort and is not a recommended configuration unless you have significant experience with puppet, need it and are able to maintain the setup, including your own postgresql database. It is guaranteed to break your puppet setup if you just enable it without following a particular order. Notes on that can be found on the [[Help:project puppetserver/PuppetDB|project puppetserver puppetdb notes page]]. === Git push from workstation to puppetserver === Typically development is done on non-Cloud VPS machines, this applies to Puppet as well, so a common use case is to develop locally and then push to your Cloud VPS project's puppetserver for testing. {{Note|content=When applying local changes to the puppet git repos, remember that only the '''production''' branch (in <code>operations/puppet</code> or the '''master''' branch (in <code>labs/private</code> will ever actually be deployed. Other local branches may be useful for tracking your work, but any changes must be cherry-picked to the production or master branch in order to be applied to puppet clients.}} ==== Push using multiple branches ==== If you want your Cloud VPS instance to act as yet another remote Git repository, you have to configure Git to use <code>sudo</code> as well. To do that, in your local computer's git clone of the Puppet repository <syntaxhighlight lang="shell-session"> $ git remote add my-instance my-instance.my-project.eqiad1.wikimedia.cloud:/srv/git/operations/puppet $ git config remote.my-instance.receivepack "sudo git-receive-pack" $ git config remote.my-instance.uploadpack "sudo git-upload-pack" </syntaxhighlight> After that, you can push and pull branches from and to your Cloud VPS instance. If you pull commits from your Cloud VPS instance, remember to reset authorship information with <code>git commit --amend --reset-author</code> before pushing to Gerrit. You cannot push to the branch that is currently checked out. To avoid having to log into your Cloud VPS puppetserver and checking out branches, etc., you can create a <code>post-receive</code> hook by copying: {{Collapse top|Example post-receive hook}} <syntaxhighlight lang="bash"> #!/bin/bash # # Copyright 2015, 2016 Tim Landscheidt # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # When a commit is pushed to a branch with a name starting with # "development/", this post-update script will replace all "local # hacks" that have a matching Gerrit "Change-Id:" footer with the ones # in this branch and add new "local hacks" on top. When a branch with # a name starting with "development/" is deleted, it will remove all # "local hacks" that are referenced in that branch. # Hooks are executed in the ".git" subdirectory, so move upwards to # allow rebases, etc. to work. As hooks have the environment variable # "GIT_DIR" set to ".", this means we need to amend it. cd .. export GIT_DIR=.git sedcommands=$(mktemp) # Iterate over all branches that were changed. while read oldref newref branch; do # Only process pushes to branches with a name starting with # "development/". if [ "$branch" = "${branch#refs/heads/development/}" ]; then continue fi # Iterate over all changes in the branch that are not part of # "production". if [ "$newref" = "0000000000000000000000000000000000000000" ]; then range=production..$oldref else range=production..$newref fi for commit in $(git log --format=format:%H --reverse $range); do # Check whether there is already a "local hack" with the same # Gerrit "Change-Id:" footer and remove it. changeid="$(git log -1 --format=format:%B $commit | sed -ne 's/^Change-Id: //p;')" localhack="$(git log origin/production..production --format=format:%H --grep "^Change-Id: $changeid\$")" # Was the branch deleted? if [ "$newref" = "0000000000000000000000000000000000000000" ]; then # Then delete the local hack if it exists. if [ -n "$localhack" ]; then echo "/^pick ${localhack:0:7} .*\$/d" >> "$sedcommands" fi else # Otherwise update or add the local hack. if [ -n "$localhack" ]; then echo "s/^pick ${localhack:0:7} .*\$/pick $commit/" >> "$sedcommands" else echo "\$apick $commit" >> "$sedcommands" fi fi done done if ! GIT_SEQUENCE_EDITOR="sed -i -f $sedcommands" git rebase -i origin/production; then git rebase --abort echo "$0: Could not rebase changes." >&2 fi rm -f $sedcommands </syntaxhighlight> {{Collapse bottom}} to <code>/srv/git/operations/puppet/.git/hooks/post-receive</code> and making it executable (<code>chmod +x /srv/git/operations/puppet/.git/hooks/post-receive</code>). With this hook in place, when you push commits to a branch with a name that starts with <code>development/</code>: * if a "local hack" (a commit that is in the <code>production</code> branch, but not in <code>origin/production</code>) with the same <code>Change-Id</code> already exists, it will replace that commit with the one in your pushed branch, or * if no local hack with the same <code>Change-Id</code> exists, it will apply it on top of other local hacks. If you delete a branch (<code>git push -f my-instance :development/branch</code>), the local hacks in that branch are removed. You can use the script <code>git-puppet-test</code> to combine pushing your local development branch to the Cloud VPS puppet master with running Puppet on several hosts and executing a test script: {{Collapse top|git-puppet-test script}} <syntaxhighlight lang="python"> #!/usr/bin/python3 # # Copyright 2014, 2015, 2016 Tim Landscheidt # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import re import subprocess import sys parser = argparse.ArgumentParser() parser.add_argument('--set-pre-command', dest='pre_command', help='set the pre-command') parser.add_argument('--set-post-command', dest='post_command', help='set the post-command') parser.add_argument('--set-hosts', dest='hosts', help='set the hosts') args = parser.parse_args() # Ensure there are no uncommitted changes. if subprocess.check_output(['git', 'status', '--porcelain']): print('Uncommitted changes.') sys.exit(1) # Determine change-ID. commit_message = subprocess.check_output(['git', 'cat-file', '-p', 'HEAD'], universal_newlines=True) change_id = None for line in commit_message.split('\n'): if line.startswith('Change-Id: '): change_id = line[len('Change-Id: '):] if change_id is None: print("Couldn't find Change-Id.") sys.exit(1) # Determine remote repository for puppetserver. puppetmaster_remote = None for line in subprocess.check_output(['git', 'remote', '-v'], universal_newlines=True).split('\n'): m = re.fullmatch(r'([^\t]+)\t' r'.*\.wmflabs:/srv/git/operations/puppet \(push\)', line) if m: puppetmaster_remote = m.group(1) if puppetmaster_remote is None: print("Couldn't determine remote repository for puppetmaster.") sys.exit(1) # Process --set-* options and terminate if they were given. if args.pre_command is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.pre-command' % change_id, args.pre_command]) if args.post_command is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.post-command' % change_id, args.post_command]) if args.hosts is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.hosts' % change_id, args.hosts]) if any((args.pre_command is not None, args.post_command is not None, args.hosts is not None)): sys.exit(0) # Run pre-command. try: pre_command = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.pre-command' % change_id], universal_newlines=True) subprocess.call(pre_command, shell=True) except subprocess.CalledProcessError: pass # Push change to puppetserver. subprocess.check_call(['git', 'push', '-f', puppetmaster_remote, 'HEAD:development/' + change_id]) # Run Puppet on configured hosts. try: hosts = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.hosts' % change_id], universal_newlines=True) subprocess.call(['clush', '-w', hosts, 'tmplog="$(mktemp)" && ' 'if ! sudo puppet agent -tv > "$tmplog" 2>&1; then ' 'cat "$tmplog"; ' 'fi && ' 'rm -f "$tmplog"']) except subprocess.CalledProcessError: pass # Run post-command. try: post_command = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.post-command' % change_id], universal_newlines=True) subprocess.call(post_command, shell=True) except subprocess.CalledProcessError: pass </syntaxhighlight> {{Collapse bottom}} This is especially useful if you work on a patch over a longer period of time, or the tests that you need to run are complex or boring. The shell command to run before and after pushing a branch and the hosts to run Puppet on are specified by the entries <code>puppet-test.$changeid.pre-command</code>, <code>puppet-test.$changeid.post-command</code> and <code>puppet-test.$changeid.hosts</code> in <code>.git/config</code> with <code>$changeid</code> being the <code>Change-Id</code> of the commit at <code>HEAD</code>. You can set those also by <code>git puppet-test --set-post-command $command</code>, etc. When a patch is merged that you previously pushed as a local hack, if it is identical, i. e. the changes are the same, it will just "disappear" from the list of local hacks. If it has been amended, the automatic pull with <code>git-sync-upstream</code> will fail, and you will have to log in and remove the local hack with <code>git rebase -i origin/production production</code> and deleting the line of the local hack in question. You need to apply the same steps if you want to abandon a local hack that you no longer want to work on. '''NOTE''': If the puppet master is shared among more than one person, pushing will overwrite or mess in other ways with each other's changes! ==== Push using a single branch ==== In case a puppetserver is used to '''test a single branch/patchset at a time''', a simple solution is to to push to an intermediate bare repo and then update the non-bare authoritative repo. The git hook added in {{Gerrit|159720}} achieves this. So assuming you have a <code>role::puppetserver::cloud_vps_project</code> configured, you can setup a local bare repo e.g. in your Cloud VPS home (as your user): <syntaxhighlight lang="shell-session"> cd ~ [ -d ~/puppet.git ] || git clone --bare --no-hardlinks --branch production /srv/git/operations/puppet/ ~/puppet.git install -m755 /srv/git/operations/puppet/modules/puppetmaster/files/self-master-post-receive ~/puppet.git/hooks/post-receive </syntaxhighlight> and on the development host add the relevant remote: <syntaxhighlight lang="shell-session"> git remote add project_puppetmaster ssh://PUPPET_MASTER_HOSTNAME/~/puppet.git </syntaxhighlight> then to test a puppet change in your Cloud VPS puppet master from your local host you can force-push to that remote in the "production" branch, e.g. <syntaxhighlight lang="shell-session"> git push -f project_puppetmaster HEAD:production </syntaxhighlight> == Renewing puppetserver CA certificate == Most new puppetserver have a CA cert that expires in 5 years. If your CA is expiring, it may be time to build a new puppetserver! Regenerating the CA cert is possible but require acting on every managed instance so is non-trivial. These instructions are paraphrased and adapted from [https://www.sysbee.net/blog/how-to-renew-puppet-ca-certificate/ | here] and tested with puppet version 5.5.22. They may require modification for newer puppet versions. On the puppetserver: <syntaxhighlight lang="shell-session"> $ FQDN="$(hostname -f)" # change this to the fqdn the clients use if it's not the same as the host fqdn $ SERIAL="$(openssl x509 -in /srv/puppet/server/ssl/ca/ca_crt.pem -noout -serial | cut -f2 -d '=')" $ openssl x509 -x509toreq -in /srv/puppet/server/ssl/ca/ca_crt.pem -signkey /srv/puppet/server/ssl/ca/ca_key.pem -out /srv/puppet/server/ssl/ca/ca_csr.pem $ cat > /root/puppet-ca-extension.cnf << EOF [CA_extensions] basicConstraints = critical,CA:TRUE nsComment = "Puppet Ruby/OpenSSL Internal Certificate" keyUsage = critical,keyCertSign,cRLSign authorityKeyIdentifier=keyid,issuer subjectKeyIdentifier = hash EOF $ openssl x509 -req -days 3650 -in /srv/puppet/server/ssl/ca/ca_csr.pem -signkey /srv/puppet/server/ssl/ca/ca_key.pem -out /srv/puppet/server/ssl/ca/ca_crt.pem -extfile /root/puppet-ca-extension.cnf -extensions CA_extensions -set_serial "$SERIAL" $ find /srv/puppet -name "$FQDN.pem" -exec rm {} \; $ puppet ca generate "$FQDN" --dns-alt-names "$FQDN" $ ## If your puppetserver is a client of itself: $ cp /srv/puppet/server/ssl/public_keys/"$FQDN.pem" /var/lib/puppet/ssl/public_keys/"$FQDN.pem" $ cp /srv/puppet/server/ssl/private_keys/"$FQDN.pem" /var/lib/puppet/ssl/private_keys/"$FQDN.pem" </syntaxhighlight> '''NOTE''': If you are refreshing <code>cloud-puppetmaster-*</code>, you might have to change the certificates in the secret repository to match the new ones, otherwise the next puppet run will change them: <syntaxhighlight lang="shell-session"> root@cloud-puppetmaster-03:~# run-puppet-agent ... Notice: /Stage[main]/Profile::Puppetmaster::Frontend/Puppetmaster::Web_frontend[puppetmaster.cloudinfra.wmflabs.org]/File[/var/lib/puppet/ssl/certs/puppetmaster.cloudinfra.wmflabs.org.pem]/ensure: defined content as '{md5}dc365c2c170e357710dd00754f35de70' (corrective) Notice: /Stage[main]/Profile::Puppetmaster::Frontend/Puppetmaster::Web_frontend[puppetmaster.cloudinfra.wmflabs.org]/File[/var/lib/puppet/ssl/private_keys/puppetmaster.cloudinfra.wmflabs.org.pem]/ensure: defined content as '{md5}0ec67b1df9f79a1b3fd33c52c475a220' (corrective) Notice: Applied catalog in 10.33 seconds </syntaxhighlight> To fix that you'll have to copy: <syntaxhighlight lang="shell-session"> /srv/puppet/server/ssl/private_keys/puppet.pem /srv/puppet/server/ssl/certs/puppet.pem /srv/puppet/server/ssl/private_keys/puppetmaster.cloudinfra.wmflabs.org.pem /srv/puppet/server/ssl/certs/puppetmaster.cloudinfra.wmflabs.org.pem # to the puppetmaster private repo (cloud-puppetmaster-03 right now): modules/secret/secrets/puppetmaster/puppet_privkey.pem modules/secret/secrets/puppetmaster/puppet_pubkey.pem modules/secret/secrets/puppetmaster/puppetmaster.cloudinfra.wmflabs.org_privkey.pem modules/secret/secrets/puppetmaster/puppetmaster.cloudinfra.wmflabs.org_pubkey.pem </syntaxhighlight> On each client of that puppetserver (not needed most of the time): <syntaxhighlight lang="shell-session"> $ # It is probably safe to run this tenant-wide; clearing the ca cert $ # on a host unaffected by a cert rotation appears to be harmless. $ rm /var/lib/puppet/ssl/certs/ca.pem </syntaxhighlight> {{:Help:Cloud Services communication}} == See also == * Testing catalog with [[Help:Puppet-compiler | puppet-compiler ]] [[Category:Cloud VPS]] [[Category:Documentation]] [[Category:Puppet]] [[Category:Documentation|Puppetmaster]] sa7thm62yzs2g8moqfxtahuje93kzi1 2243202 2243201 2024-11-10T18:09:54Z Physikerwelt 244 /* How can I use a project puppetserver? */ 2243202 wikitext text/x-wiki {{Cloud VPS nav}} This page contains information about project puppetservers, helps you decide why you would use one, and provides basic trouble shooting information. Prior to Debian Bookworm, similar workflows were accomplished via the <code>role::puppetmaster::standalone</code> puppet role. This process is now deprecated, but the documentation can be found at [[Obsolete:Standalone_puppetmaster]]. == About project puppetservers == A '''project puppetserver''' can be used to deliver either a custom or experimental [[Puppet]] configuration to one or more hosts in a Cloud VPS project. All Cloud VPS instances run puppet via cron about every 30 minutes. The default configuration pulls from the shared Cloud VPS puppetserver which has a pristine checkout of <code>operations/puppet.git</code> and <code>labs/private.git</code>. Sometimes you want to have a puppetserver that you control as a Cloud VPS user. You can use <code>role::puppetserver::cloud_vps_project</code> to accomplish this. == Why would I want a project puppetserver? == These are the popular reasons for wanting to use one: # Testing puppet patches before they have been merged in <code>operations/puppet.git</code> # Keep project local secrets in a <code>labs/private.git</code> repo {{Note|content=As of puppet version 7, naming has changed from 'puppetmaster' to 'puppetserver'. The names are still used interchangeably in several places but typically 'puppetmaster' (<code>role::puppetmaster::standalone</code>) refers to a version 5 server and 'puppetserver' (<code>role::puppetserver::cloud_vps_project</code>) to a version 7 server.}} == How can I use a project puppetserver? == === Step 1: Setup a puppetserver === # Create a Debian Bookworm instance with at least 2 GB of RAM (g4.cores1.ram2.disk20 to support a handful of VMs, or a larger size for a bigger project) # (optional) create and mount a 5GB [[cinder]] volume on /srv. That will allow you to transfer certs and puppet code to future upgraded puppetservers. (See [[Help:Adding_disk_space_to_Cloud_VPS_instances]] for detailed instructions) # Apply the role <code>role::puppetserver::cloud_vps_project</code> to the new VM # Apply the following hiera settings: #:<code>profile::puppet::agent::force_puppet7: true</code> #:<code>puppetmaster: puppet</code> # Force two consecutive puppet runs on the instance using <code>sudo -i run-puppet-agent</code> # Generate the server certs: <code>sudo puppetserver ca generate --certname $(hostname -f)</code> # Make sure puppetserver can restart: <code>sudo systemctl restart puppetserver</code> You will now find a checkout of operations/puppet.git in <code>/srv/git/operations/puppet</code> and a checkout of labs/private.git in <code>/srv/git/labs/private</code>. {{Note|content=A project puppetserver is '''not''' its own client by default; to configure this, see below.}} ==== Autosigning ==== If you are not planning on keeping '''any''' secrets in your puppetserver, you can turn on autosigning. This will automatically accept new clients as they contact the server, and save you a step later on when adding clients. You can do that by adding the following to the hiera configuration for the puppetserver instance: <code>profile::puppetserver::autosign: true</code> {{Note|content=If you are keeping secrets, you must not turn this on! If you do, anyone with network access to the puppetmaster can access all your secrets!|type=warning}} === Step 2: Setup a puppet client === This works for any number of instances, including the puppetserver itself. # Using the [[Horizon]] interface: ## Set the [[hiera]] variable <code>puppetmaster: <your puppetserver's FQDN></code>. (Scroll all the way to the bottom of the Puppet Configuration tab, to the Hiera Config button, and add the variable name/value in YAML format.) Use the fully qualified domain name of the puppetserver (i.e. <code>hostname.projectname.eqiad1.wikimedia.cloud</code> which can be found by running <code>hostname -f</code> on the puppetserver host). ##* This variable can be set either per-instance or project wide. # On the client instance: ## Run puppet to apply the modified hiera configuration: <code>sudo -i run-puppet-agent</code> ## Remove existing SSL certificates created with the prior puppetmaster: <code>sudo rm -rf /var/lib/puppet/ssl</code> (NOTE: you will see <code>certificate verify failed: [self signed certificate in certificate chain ...]</code> when you run puppet if you don't do this. ## '''Only''' if the client is the puppetserver itself, run also: ##:<syntaxhighlight lang="bash">sudo rm /srv/puppet/server/ssl/ca/signed/$(hostname -f).pem sudo mkdir -p /var/lib/puppet/ssl/private_keys sudo cp -v /srv/puppet/{server/,}ssl/private_keys/$(hostname -f).pem sudo mkdir /var/lib/puppet/ssl/certs/ sudo cp -v /srv/puppet/{server/,}ssl/certs/$(hostname -f).pem</syntaxhighlight> ## Force a puppet run to generate a new local SSL certificate: <code>sudo -i run-puppet-agent</code> # On the puppetserver (If you enabled autosigning in the master, you can skip this step): ## Check that the fingerprint of the client's key matches the one shown when it was generated running <code>sudo -i puppet cert list --all</code> (Puppet 5) or <code>sudo -i puppetserver ca list --all</code> (Puppet 7) ## Sign the key of the client: <code>sudo -i puppet cert sign <client-fqdn></code> (Puppet 5) or <code>puppetserver ca sign --certname <client-fqdn></code> (Puppet 7). # On the client: ## Run puppet twice to check that is connecting correctly to its new server and that the second run does not produce any new configuration changes: <code>sudo -i run-puppet-agent</code> === Deploying changes === Changes made in /srv/git/operations/puppet or /srv/git/labs/private will not immediately be applied by the puppetserver; first these changes must be deployed. Deploy by running <code>sudo puppetserver-deploy-code</code>. <code>puppetserver-deploy-code</code> will only deploy the '''production''' branch in <code>/operations/puppet</code> and only deploy the '''master''' branch in <code>/labs/private</code>. In most cases, git operations in /srv/git will trigger a hook that automatically deploys code. But, when in doubt, manually redeploy to be sure that you're applying the latest production branch. === Troubleshooting === ==== Forgetting to run <code>puppet agent</code> with sudo ==== Failing to run <code>puppet agent</code> as root can result in the surprising error: <pre> Debug: Creating new connection for https://puppet:8140 Error: Could not request certificate: getaddrinfo: Name or service not known Exiting; failed to retrieve certificate and waitforcert is disabled </pre> ==== Interacting with the local Git clones ==== The clones of the Puppet repositories in <code>/srv/git</code> must be owned by and interacted via the the <code>gitpuppet</code> user. A helper script <code>pgit</code> is made available in $PATH. ==== Hiera values from local commits don't seem to apply! ==== Hiera in Cloud VPS has different settings than in Wikimedia Production puppet, and it can be tricky to get your values into the right place in <code>labs/private.git</code> or puppet in general. First, check <code>/etc/puppet/hiera.yaml</code> on your standalone puppetserver for guidance. You may just need to add things to <code>labs/private/hieradata/labs/<VPS project name>/common.yaml</code>. You can verify that a node can "see" your hiera settings with the command <code>sudo puppet lookup --node <fqdn of puppet client> --compile --explain <your::hiera::key></code> on the project puppetserver. The compile is needed to set all the right facts for hiera in Cloud VPS, and there will be a lot of warnings you can ignore. The answer will be at the end of the command's output. === Adding a secret === To add a new secret to your puppetserver to use in your manifests, you have to add a new local commit to the repository under <code>/srv/git/labs/private</code> in the puppetserver node. Note that there might be a leftover git hook preventing you to add a new commit with the message: {{Codesample | code = Local commits are not allowed in this repository. Please go to frontend puppetmaster and commit | lang = bash }} This is a leftover from the puppet migration to puppet 7 and can be ignored, to do so just <code>rm /srv/git/labs/private/.git/hooks/pre-commit</code>, and it will not bother you again. == Advanced use cases == === Using puppetdb === WARNING: This is is difficult and complicated. Proceed with caution. Puppetdb can be enabled on standalone puppetservers on Cloud VPS by designating a puppetdb server with the <code>role::puppetdb</code> role and a lot of hiera values on both the DB server and the puppetmaster. This requires significant effort and is not a recommended configuration unless you have significant experience with puppet, need it and are able to maintain the setup, including your own postgresql database. It is guaranteed to break your puppet setup if you just enable it without following a particular order. Notes on that can be found on the [[Help:project puppetserver/PuppetDB|project puppetserver puppetdb notes page]]. === Git push from workstation to puppetserver === Typically development is done on non-Cloud VPS machines, this applies to Puppet as well, so a common use case is to develop locally and then push to your Cloud VPS project's puppetserver for testing. {{Note|content=When applying local changes to the puppet git repos, remember that only the '''production''' branch (in <code>operations/puppet</code> or the '''master''' branch (in <code>labs/private</code> will ever actually be deployed. Other local branches may be useful for tracking your work, but any changes must be cherry-picked to the production or master branch in order to be applied to puppet clients.}} ==== Push using multiple branches ==== If you want your Cloud VPS instance to act as yet another remote Git repository, you have to configure Git to use <code>sudo</code> as well. To do that, in your local computer's git clone of the Puppet repository <syntaxhighlight lang="shell-session"> $ git remote add my-instance my-instance.my-project.eqiad1.wikimedia.cloud:/srv/git/operations/puppet $ git config remote.my-instance.receivepack "sudo git-receive-pack" $ git config remote.my-instance.uploadpack "sudo git-upload-pack" </syntaxhighlight> After that, you can push and pull branches from and to your Cloud VPS instance. If you pull commits from your Cloud VPS instance, remember to reset authorship information with <code>git commit --amend --reset-author</code> before pushing to Gerrit. You cannot push to the branch that is currently checked out. To avoid having to log into your Cloud VPS puppetserver and checking out branches, etc., you can create a <code>post-receive</code> hook by copying: {{Collapse top|Example post-receive hook}} <syntaxhighlight lang="bash"> #!/bin/bash # # Copyright 2015, 2016 Tim Landscheidt # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # When a commit is pushed to a branch with a name starting with # "development/", this post-update script will replace all "local # hacks" that have a matching Gerrit "Change-Id:" footer with the ones # in this branch and add new "local hacks" on top. When a branch with # a name starting with "development/" is deleted, it will remove all # "local hacks" that are referenced in that branch. # Hooks are executed in the ".git" subdirectory, so move upwards to # allow rebases, etc. to work. As hooks have the environment variable # "GIT_DIR" set to ".", this means we need to amend it. cd .. export GIT_DIR=.git sedcommands=$(mktemp) # Iterate over all branches that were changed. while read oldref newref branch; do # Only process pushes to branches with a name starting with # "development/". if [ "$branch" = "${branch#refs/heads/development/}" ]; then continue fi # Iterate over all changes in the branch that are not part of # "production". if [ "$newref" = "0000000000000000000000000000000000000000" ]; then range=production..$oldref else range=production..$newref fi for commit in $(git log --format=format:%H --reverse $range); do # Check whether there is already a "local hack" with the same # Gerrit "Change-Id:" footer and remove it. changeid="$(git log -1 --format=format:%B $commit | sed -ne 's/^Change-Id: //p;')" localhack="$(git log origin/production..production --format=format:%H --grep "^Change-Id: $changeid\$")" # Was the branch deleted? if [ "$newref" = "0000000000000000000000000000000000000000" ]; then # Then delete the local hack if it exists. if [ -n "$localhack" ]; then echo "/^pick ${localhack:0:7} .*\$/d" >> "$sedcommands" fi else # Otherwise update or add the local hack. if [ -n "$localhack" ]; then echo "s/^pick ${localhack:0:7} .*\$/pick $commit/" >> "$sedcommands" else echo "\$apick $commit" >> "$sedcommands" fi fi done done if ! GIT_SEQUENCE_EDITOR="sed -i -f $sedcommands" git rebase -i origin/production; then git rebase --abort echo "$0: Could not rebase changes." >&2 fi rm -f $sedcommands </syntaxhighlight> {{Collapse bottom}} to <code>/srv/git/operations/puppet/.git/hooks/post-receive</code> and making it executable (<code>chmod +x /srv/git/operations/puppet/.git/hooks/post-receive</code>). With this hook in place, when you push commits to a branch with a name that starts with <code>development/</code>: * if a "local hack" (a commit that is in the <code>production</code> branch, but not in <code>origin/production</code>) with the same <code>Change-Id</code> already exists, it will replace that commit with the one in your pushed branch, or * if no local hack with the same <code>Change-Id</code> exists, it will apply it on top of other local hacks. If you delete a branch (<code>git push -f my-instance :development/branch</code>), the local hacks in that branch are removed. You can use the script <code>git-puppet-test</code> to combine pushing your local development branch to the Cloud VPS puppet master with running Puppet on several hosts and executing a test script: {{Collapse top|git-puppet-test script}} <syntaxhighlight lang="python"> #!/usr/bin/python3 # # Copyright 2014, 2015, 2016 Tim Landscheidt # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import re import subprocess import sys parser = argparse.ArgumentParser() parser.add_argument('--set-pre-command', dest='pre_command', help='set the pre-command') parser.add_argument('--set-post-command', dest='post_command', help='set the post-command') parser.add_argument('--set-hosts', dest='hosts', help='set the hosts') args = parser.parse_args() # Ensure there are no uncommitted changes. if subprocess.check_output(['git', 'status', '--porcelain']): print('Uncommitted changes.') sys.exit(1) # Determine change-ID. commit_message = subprocess.check_output(['git', 'cat-file', '-p', 'HEAD'], universal_newlines=True) change_id = None for line in commit_message.split('\n'): if line.startswith('Change-Id: '): change_id = line[len('Change-Id: '):] if change_id is None: print("Couldn't find Change-Id.") sys.exit(1) # Determine remote repository for puppetserver. puppetmaster_remote = None for line in subprocess.check_output(['git', 'remote', '-v'], universal_newlines=True).split('\n'): m = re.fullmatch(r'([^\t]+)\t' r'.*\.wmflabs:/srv/git/operations/puppet \(push\)', line) if m: puppetmaster_remote = m.group(1) if puppetmaster_remote is None: print("Couldn't determine remote repository for puppetmaster.") sys.exit(1) # Process --set-* options and terminate if they were given. if args.pre_command is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.pre-command' % change_id, args.pre_command]) if args.post_command is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.post-command' % change_id, args.post_command]) if args.hosts is not None: subprocess.check_call(['git', 'config', 'puppet-test.%s.hosts' % change_id, args.hosts]) if any((args.pre_command is not None, args.post_command is not None, args.hosts is not None)): sys.exit(0) # Run pre-command. try: pre_command = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.pre-command' % change_id], universal_newlines=True) subprocess.call(pre_command, shell=True) except subprocess.CalledProcessError: pass # Push change to puppetserver. subprocess.check_call(['git', 'push', '-f', puppetmaster_remote, 'HEAD:development/' + change_id]) # Run Puppet on configured hosts. try: hosts = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.hosts' % change_id], universal_newlines=True) subprocess.call(['clush', '-w', hosts, 'tmplog="$(mktemp)" && ' 'if ! sudo puppet agent -tv > "$tmplog" 2>&1; then ' 'cat "$tmplog"; ' 'fi && ' 'rm -f "$tmplog"']) except subprocess.CalledProcessError: pass # Run post-command. try: post_command = subprocess.check_output(['git', 'config', '--get', 'puppet-test.%s.post-command' % change_id], universal_newlines=True) subprocess.call(post_command, shell=True) except subprocess.CalledProcessError: pass </syntaxhighlight> {{Collapse bottom}} This is especially useful if you work on a patch over a longer period of time, or the tests that you need to run are complex or boring. The shell command to run before and after pushing a branch and the hosts to run Puppet on are specified by the entries <code>puppet-test.$changeid.pre-command</code>, <code>puppet-test.$changeid.post-command</code> and <code>puppet-test.$changeid.hosts</code> in <code>.git/config</code> with <code>$changeid</code> being the <code>Change-Id</code> of the commit at <code>HEAD</code>. You can set those also by <code>git puppet-test --set-post-command $command</code>, etc. When a patch is merged that you previously pushed as a local hack, if it is identical, i. e. the changes are the same, it will just "disappear" from the list of local hacks. If it has been amended, the automatic pull with <code>git-sync-upstream</code> will fail, and you will have to log in and remove the local hack with <code>git rebase -i origin/production production</code> and deleting the line of the local hack in question. You need to apply the same steps if you want to abandon a local hack that you no longer want to work on. '''NOTE''': If the puppet master is shared among more than one person, pushing will overwrite or mess in other ways with each other's changes! ==== Push using a single branch ==== In case a puppetserver is used to '''test a single branch/patchset at a time''', a simple solution is to to push to an intermediate bare repo and then update the non-bare authoritative repo. The git hook added in {{Gerrit|159720}} achieves this. So assuming you have a <code>role::puppetserver::cloud_vps_project</code> configured, you can setup a local bare repo e.g. in your Cloud VPS home (as your user): <syntaxhighlight lang="shell-session"> cd ~ [ -d ~/puppet.git ] || git clone --bare --no-hardlinks --branch production /srv/git/operations/puppet/ ~/puppet.git install -m755 /srv/git/operations/puppet/modules/puppetmaster/files/self-master-post-receive ~/puppet.git/hooks/post-receive </syntaxhighlight> and on the development host add the relevant remote: <syntaxhighlight lang="shell-session"> git remote add project_puppetmaster ssh://PUPPET_MASTER_HOSTNAME/~/puppet.git </syntaxhighlight> then to test a puppet change in your Cloud VPS puppet master from your local host you can force-push to that remote in the "production" branch, e.g. <syntaxhighlight lang="shell-session"> git push -f project_puppetmaster HEAD:production </syntaxhighlight> == Renewing puppetserver CA certificate == Most new puppetserver have a CA cert that expires in 5 years. If your CA is expiring, it may be time to build a new puppetserver! Regenerating the CA cert is possible but require acting on every managed instance so is non-trivial. These instructions are paraphrased and adapted from [https://www.sysbee.net/blog/how-to-renew-puppet-ca-certificate/ | here] and tested with puppet version 5.5.22. They may require modification for newer puppet versions. On the puppetserver: <syntaxhighlight lang="shell-session"> $ FQDN="$(hostname -f)" # change this to the fqdn the clients use if it's not the same as the host fqdn $ SERIAL="$(openssl x509 -in /srv/puppet/server/ssl/ca/ca_crt.pem -noout -serial | cut -f2 -d '=')" $ openssl x509 -x509toreq -in /srv/puppet/server/ssl/ca/ca_crt.pem -signkey /srv/puppet/server/ssl/ca/ca_key.pem -out /srv/puppet/server/ssl/ca/ca_csr.pem $ cat > /root/puppet-ca-extension.cnf << EOF [CA_extensions] basicConstraints = critical,CA:TRUE nsComment = "Puppet Ruby/OpenSSL Internal Certificate" keyUsage = critical,keyCertSign,cRLSign authorityKeyIdentifier=keyid,issuer subjectKeyIdentifier = hash EOF $ openssl x509 -req -days 3650 -in /srv/puppet/server/ssl/ca/ca_csr.pem -signkey /srv/puppet/server/ssl/ca/ca_key.pem -out /srv/puppet/server/ssl/ca/ca_crt.pem -extfile /root/puppet-ca-extension.cnf -extensions CA_extensions -set_serial "$SERIAL" $ find /srv/puppet -name "$FQDN.pem" -exec rm {} \; $ puppet ca generate "$FQDN" --dns-alt-names "$FQDN" $ ## If your puppetserver is a client of itself: $ cp /srv/puppet/server/ssl/public_keys/"$FQDN.pem" /var/lib/puppet/ssl/public_keys/"$FQDN.pem" $ cp /srv/puppet/server/ssl/private_keys/"$FQDN.pem" /var/lib/puppet/ssl/private_keys/"$FQDN.pem" </syntaxhighlight> '''NOTE''': If you are refreshing <code>cloud-puppetmaster-*</code>, you might have to change the certificates in the secret repository to match the new ones, otherwise the next puppet run will change them: <syntaxhighlight lang="shell-session"> root@cloud-puppetmaster-03:~# run-puppet-agent ... Notice: /Stage[main]/Profile::Puppetmaster::Frontend/Puppetmaster::Web_frontend[puppetmaster.cloudinfra.wmflabs.org]/File[/var/lib/puppet/ssl/certs/puppetmaster.cloudinfra.wmflabs.org.pem]/ensure: defined content as '{md5}dc365c2c170e357710dd00754f35de70' (corrective) Notice: /Stage[main]/Profile::Puppetmaster::Frontend/Puppetmaster::Web_frontend[puppetmaster.cloudinfra.wmflabs.org]/File[/var/lib/puppet/ssl/private_keys/puppetmaster.cloudinfra.wmflabs.org.pem]/ensure: defined content as '{md5}0ec67b1df9f79a1b3fd33c52c475a220' (corrective) Notice: Applied catalog in 10.33 seconds </syntaxhighlight> To fix that you'll have to copy: <syntaxhighlight lang="shell-session"> /srv/puppet/server/ssl/private_keys/puppet.pem /srv/puppet/server/ssl/certs/puppet.pem /srv/puppet/server/ssl/private_keys/puppetmaster.cloudinfra.wmflabs.org.pem /srv/puppet/server/ssl/certs/puppetmaster.cloudinfra.wmflabs.org.pem # to the puppetmaster private repo (cloud-puppetmaster-03 right now): modules/secret/secrets/puppetmaster/puppet_privkey.pem modules/secret/secrets/puppetmaster/puppet_pubkey.pem modules/secret/secrets/puppetmaster/puppetmaster.cloudinfra.wmflabs.org_privkey.pem modules/secret/secrets/puppetmaster/puppetmaster.cloudinfra.wmflabs.org_pubkey.pem </syntaxhighlight> On each client of that puppetserver (not needed most of the time): <syntaxhighlight lang="shell-session"> $ # It is probably safe to run this tenant-wide; clearing the ca cert $ # on a host unaffected by a cert rotation appears to be harmless. $ rm /var/lib/puppet/ssl/certs/ca.pem </syntaxhighlight> {{:Help:Cloud Services communication}} == See also == * Testing catalog with [[Help:Puppet-compiler | puppet-compiler ]] [[Category:Cloud VPS]] [[Category:Documentation]] [[Category:Puppet]] [[Category:Documentation|Puppetmaster]] j6fjyt7870f94r8npw2vqh7nex9afhi Wikitech:Rename requests 4 455666 2243200 2243193 2024-11-10T13:14:46Z Zabe 21763 /* Hex */ Reply 2243200 wikitext text/x-wiki Users can '''request a username change to match their SUL username'''. They must prove they are the same person by confirming the rename on wiki in both places: wikitech and one of SUL wikis. Renames are done by [[Wikitech:Bureaucrats|bureaucrats]] using [[Special:RenameUser]]. Actions are logged at [[Special:Log/renameuser]]. == Requests == <pre> === Example === {{user2|Foo}} -> {{user2|Bar}} [link to confirmation edit on a SUL wiki] ~~~~ </pre> === Jgiannelos === {{user2|Jgiannelos}} -> {{user2|JGiannelos (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:JGiannelos_(WMF)&diff=prev&oldid=27611127 Confirmation edit on Meta] [[User:Jgiannelos|Jgiannelos]] ([[User talk:Jgiannelos|talk]]) 09:28, 16 October 2024 (UTC) :{{Done}} @[[User:JGiannelos (WMF)|JGiannelos (WMF)]], your legacy Jgiannelos has been renamed. I also needed to rename [[User:JGiannelos (WMF) (usurped)]] to move it out of the way of unifying your account names. Please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. -- [[User:BryanDavis|BryanDavis]] ([[User talk:BryanDavis|talk]]) 17:45, 16 October 2024 (UTC) === LSobanski === {{user2|LSobanski}} -> {{user2|LSobanski (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3ALSobanski_%28WMF%29&diff=27578184&oldid=21985507 Confirmation edit on Meta] * {{done}} [[User:LSobanski (WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. --[[User:BryanDavis|BryanDavis]] ([[User talk:BryanDavis|talk]]) 18:18, 9 October 2024 (UTC) === Alangi Derick === {{user2|Alangi Derick}} -> {{user2|X-Savitar}} Not sure if this is an account merge or rename. But both are the same users and would like to transfer my edits from the former to the later. Conformation [[:meta:Special:Diff/27572366]]. Thank you! :{{Done}} [[User:X-Savitar]] You are renamed here, please login with the old password but new username (if it doesn't work, use Special:PasswordReset to reset it) and then once logged in, use Special:MergeAccount to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:34, 8 October 2024 (UTC) ::Thank you very much @[[User:Ladsgroup|Ladsgroup]]. I appreciate, everything looks good now. 🙏🏽 -- [[User:X-Savitar|X-Savitar]] ([[User talk:X-Savitar|talk]]) 18:51, 8 October 2024 (UTC) === Ahmon Dancy === {{user2|Ahmon Dancy}} -> {{user2|ADancy_(WMF)}} Confirmation [[:meta:Special:Diff/27570215]] Please and thank you. :{{Done}} [[User:ADancy_(WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use Special:PasswordReset to reset it) and then once logged in, use Special:MergeAccount to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:36, 8 October 2024 (UTC) === Launchpad === {{user2|Launchpad}} -> {{user2|Launchpad555}} :My confirmation: [[:meta:Special:Diff/27519615]]. ::{{Done}} [[User:Launchpad555]]: Renamed, please use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:44, 1 October 2024 (UTC) === NMW03 === {{user2|NMW03}} -> {{user2|Nemoralis}} :My confirmation: [[:meta:Special:Diff/27480283]]. Is it possible to change shell username? [[User:NMW03|NMW03]] ([[User talk:NMW03|talk]]) 16:19, 18 September 2024 (UTC) :@[[User:NMW03|NMW03]] Renaming shell username is not currently possible. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:56, 24 September 2024 (UTC) :{{Done}} @[[User:Nemoralis|Nemoralis]]: Done, please use [[Special:MergeAccount]] to unify [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:45, 1 October 2024 (UTC) :: Thanks! [[User:Nemoralis|Nemoralis]] ([[User talk:Nemoralis|talk]]) 10:34, 2 October 2024 (UTC) === Massslywmde === {{user2|Massslywmde}} -> {{user2|Mohammed Abdulai (WMDE)}} :My confirmation: [[:meta:Special:Diff/27493737]]. -[[User:Massslywmde|Massslywmde]] ([[User talk:Massslywmde|talk]]) 12:21, 21 September 2024 (UTC) :@[[User:Mohammed Abdulai (WMDE)|Mohammed Abdulai (WMDE)]] {{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:32, 1 October 2024 (UTC) ::Please [[Special:MergeAccount]] to unify your wikitech and SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:33, 1 October 2024 (UTC) === Samtar === {{user2|Samtar}} -> {{user2|TheresNoTime}} :[[:mw:Special:Diff/6768644|Confirmation]], but am now realising that I had created {{u|TheresNoTime}} and got it locked per [[:phab:T302109|all this fun]], so maybe it's not going to be worth risking it (: any thoughts? [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 10:09, 23 September 2024 (UTC) :@[[User:Samtar|Samtar]] That's easy, we can rename TheresNoTime to "TheresNoTime (usurped)" and then rename this account to that. Would that be fine? [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:56, 24 September 2024 (UTC) ::{{re|Ladsgroup}} oh yeah! That'd be perfect, thank you :-) [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 11:31, 24 September 2024 (UTC) :::{{re|Samtar}} Wouldn't it make sense to rename/link the account to one of your [[m:User:TheresNoTime/disclosure#Alts.|SUL alt accounts]] and then unblock to avoid having another SUL account? --[[User:Nintendofan885|Nintendofan885]] ([[User talk:Nintendofan885|talk]]) 18:22, 24 September 2024 (UTC) ::::Don't want to make things even more complex — I'll just let Ladsgroup do as suggested. Thanks for the suggestion though :-) -- [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 13:05, 1 October 2024 (UTC) :{{Done}} @[[User:TheresNoTime|TheresNoTime]] Done now. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 18:14, 1 October 2024 (UTC) === Zoranzoki21 === {{user2|Zoranzoki21}} -> {{user2|Kizule (usurped)}} Per [[phab:T260647]]. I have <code>Kizule (usurped)</code> already available on Wikimedia's wikis, so SUL won't be an issue. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 22:54, 24 September 2024 (UTC) :{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:21, 2 October 2024 (UTC) ::It looks like that I made a mistake. Kizule (usurped) is actually someone else's account (in the past when my username was Zoranzoki21 and I wanted to rename account to Kizule to reflect my real-life nickname, Kizule was unavailable, therefore stewards moved it so they can "make a place". ::Can you rename Kizule (usurped) to Kizule (test) which is trully mine, and I can confirm that it's mine for real, therefore finish the process of merging via Special:MergeAccount? [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:40, 7 October 2024 (UTC) :::Actually.. I'm placing this on a hold, I want to rename Kizule (test) on WMF's wikis, therefore this has to wait. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:41, 7 October 2024 (UTC) ::::I made a request to get Kizule (test) renamed to Kizule2. Can you rename Kizule (usurped) to Kizule2? Once it's renamed on other SUL wikis as well, I'll use Special:MergeAccount and complete my part of the process. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:46, 7 October 2024 (UTC) :::::Actually... Sorry for confusion. Kizule2 is actually unavailable, and it's a bug that it allowed me to ask for <code>Kizule 2</code> when <code>Kizule2</code> exists. :::::Okay, let's just rename Kizule (usurped) to Kizule (test) and finish this for once. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 18:27, 7 October 2024 (UTC) ::::::{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:10, 8 October 2024 (UTC) :::::::Thank you so much, my part is done as well! :) [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 13:59, 8 October 2024 (UTC) === Nhatminh01 === {{user2|Nhatminh01}} -> {{user2|JrandWP}} Per [[mw:Topic:Wmjt52wwyz7rssiz]]. [[User:Nhatminh01|Nhatminh01]] ([[User talk:Nhatminh01|talk]]) 09:43, 1 October 2024 (UTC) :{{Not done}} @[[User:JrandWP|JrandWP]] Now that you made an account, I can't rename your old one. It has to stay. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 13:51, 1 October 2024 (UTC) === Jcrespo === {{user2|Jcrespo}} -> {{user2|JCrespo (WMF)}} See [https://en.wikipedia.org/w/index.php?title=User%3AJCrespo_%28WMF%29&diff=1248772128&oldid=1233297733 confirmation] and [https://phabricator.wikimedia.org/p/jcrespo/ phab profile] -- [[User:Jcrespo|Jcrespo]] 11:43, 1 October 2024 (UTC) :@[[User:JCrespo (WMF)|JCrespo (WMF)]]: {{Done}} please use [[Special:MergeAccount]] to unify your accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:38, 1 October 2024 (UTC) ::Thank you! All good. -- [[User:JCrespo (WMF)|JCrespo (WMF)]] ([[User talk:JCrespo (WMF)|talk]]) 13:07, 1 October 2024 (UTC) === Arturo Borrero Gonzalez === {{user2|Arturo Borrero Gonzalez}} -> {{user2|ABorrero (WMF)}} See [https://meta.wikimedia.org/w/index.php?title=User%3AABorrero_%28WMF%29&diff=27540035&oldid=27159444 confirmation] and [https://phabricator.wikimedia.org/p/aborrero/ phab profile ] [[User:ABorrero (WMF)|ABorrero (WMF)]] ([[User talk:ABorrero (WMF)|talk]]) 14:51, 1 October 2024 (UTC) :{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:44, 2 October 2024 (UTC) === David Caro === {{user2|David Caro}} -> {{user2|DCaro (WMF)}} See [https://meta.wikimedia.org/w/index.php?title=User:DCaro_(WMF)&oldid=27540174 confirmation] and [https://phabricator.wikimedia.org/p/dcaro/ phab profile ] [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:18, 1 October 2024 (UTC) :@[[User:DCaro (WMF)|DCaro (WMF)]]: {{Done}}, please login with the new username and old password and use [[Special:MergeAccount]] to unify with SUL. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:14, 2 October 2024 (UTC) ::It worked, thanks! [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 07:51, 14 October 2024 (UTC) === Bartosz Dziewoński === {{user2|Bartosz Dziewoński}} -> {{user2|Matma Rex}} See https://phabricator.wikimedia.org/p/matmarex/ as confirmation, which is linked to both accounts. Unfortunately the "Matma Rex" account has been already created here automatically when I visited the wiki, so it will have to be renamed away first. [[User:Bartosz Dziewoński|Bartosz Dziewoński]] ([[User talk:Bartosz Dziewoński|talk]]) 19:04, 1 October 2024 (UTC) :@[[User:Matma Rex]] {{Done}} now. Now you can unify your accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 21:02, 1 October 2024 (UTC) ::Done, thanks! [[User:Matma Rex|Matma Rex]] ([[User talk:Matma Rex|talk]]) 14:07, 2 October 2024 (UTC) === RLazarus === {{user2|RLazarus}} -> {{user2|RLazarus (WMF)}} ([https://meta.wikimedia.org/w/index.php?title=User:RLazarus_(WMF)&diff=prev&oldid=27541361 confirmation]) As discussed with [[User:Ladsgroup|Ladsgroup]] on IRC, I incorrectly Special:MergeAccounts'd the RLazarus_(WMF) account here but haven't edited with it. [[User:RLazarus|RLazarus]] ([[User talk:RLazarus|talk]]) 21:19, 1 October 2024 (UTC) :{{Done}} @[[User:RLazarus (WMF)|RLazarus]] You have been renamed. Please use the password of the old account and then go to [[Special:MergeAccount]] [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:27, 2 October 2024 (UTC) ::Thank you! [[User:RLazarus (WMF)|RLazarus (WMF)]] ([[User talk:RLazarus (WMF)|talk]]) 15:26, 2 October 2024 (UTC) === JMeybohm === {{user2|JMeybohm}} -> {{user2|JMeybohm_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3AJMeybohm_%28WMF%29&diff=27543428&oldid=21557650 confirmation] [[User:JMeybohm|JMeybohm]] ([[User talk:JMeybohm|talk]]) 08:15, 2 October 2024 (UTC) :@[[User:JMeybohm (WMF)|JMeybohm (WMF)]] Done. Please try logging in with the new username but old password. Then go to [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:00, 2 October 2024 (UTC) ::{{done}} - Thanks! [[User:JMeybohm (WMF)|JMeybohm (WMF)]] ([[User talk:JMeybohm (WMF)|talk]]) 10:07, 2 October 2024 (UTC) === Volans === {{user2|Volans}} -> {{user2|RCoccioli (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3ARCoccioli_%28WMF%29&diff=27545646&oldid=21479836 confirmation edit] [[User:Volans|Volans]] ([[User talk:Volans|talk]]) 10:24, 2 October 2024 (UTC) :@[[User:RCoccioli (WMF)|RCoccioli (WMF)]] Done now. Login with your new username but old password. Then use [[Special:MergeAccount]] to unify. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:44, 2 October 2024 (UTC) ::{{done}} - Thanks! [[User:RCoccioli (WMF)|RCoccioli (WMF)]] ([[User talk:RCoccioli (WMF)|talk]]) 10:56, 2 October 2024 (UTC) === Clément Goubert === {{user2|Clément Goubert}} -> {{user2|CGoubert-WMF}} [https://meta.wikimedia.org/w/index.php?title=User:CGoubert-WMF&oldid=27545717 confirmation edit] [[User:Clément Goubert|Clément Goubert]] ([[User talk:Clément Goubert|talk]]) 11:00, 2 October 2024 (UTC) :@[[User:CGoubert-WMF|CGoubert-WMF]]: Done, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:11, 2 October 2024 (UTC) ::{{done}} thank you! [[User:CGoubert-WMF|CGoubert-WMF]] ([[User talk:CGoubert-WMF|talk]]) 09:23, 3 October 2024 (UTC) === Dom Walden === {{user2|Dom_Walden}} -> {{user2|DWalden_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:DWalden_(WMF)&diff=prev&oldid=27546037 link to confirmation edit on a SUL wiki] [[User:Dom Walden|Dom Walden]] ([[User talk:Dom Walden|talk]]) 11:55, 2 October 2024 (UTC) :@[[User:DWalden (WMF)|DWalden (WMF)]]: {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:17, 2 October 2024 (UTC) === Francesco Negri === {{user2|FNegri}} -> {{user2|FNegri-WMF}} [https://meta.wikimedia.org/w/index.php?diff=27546410 confirmation edit] (Note: I mistakenly already logged in to Wikitech with my SUL account, sorry about that!) [[User:FNegri|FNegri]] ([[User talk:FNegri|talk]]) 14:03, 2 October 2024 (UTC) :@[[User:FNegri-WMF|FNegri-WMF]] {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:20, 2 October 2024 (UTC) === Dreamrimmer === {{user2|Dreamrimmer}} -> {{user2|DreamRimmer}} [https://meta.wikimedia.org/w/index.php?title=User:DreamRimmer&diff=prev&oldid=27551054 link to confirmation edit on a SUL wiki] [[User:Dreamrimmer|Dreamrimmer]] ([[User talk:Dreamrimmer|talk]]) 13:52, 3 October 2024 (UTC) :@[[User:DreamRimmer|DreamRimmer]] {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 15:38, 3 October 2024 (UTC) === Rexo === {{user2|Remagoxer}} -> {{user2|Rexogamer}} [https://en.wikipedia.org/w/index.php?title=User:Rexogamer&diff=prev&oldid=1249325350 confirmation on enwiki]. note that I accidentally created an account here yesterday, so that should be renamed first. [[User:Remagoxer|Remagoxer]] ([[User talk:Remagoxer|talk]]) 10:24, 4 October 2024 (UTC) :{{Done}} [[User:Rexogamer]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:09, 4 October 2024 (UTC) === Ben Tullis === {{user2|Btullis}} -> {{user2|BTullis_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:BTullis_(WMF)/RAID&oldid=24240616] [[User:BTullis (WMF)|BTullis (WMF)]] ([[User talk:BTullis (WMF)|talk]]) 10:26, 4 October 2024 (UTC) It seems that I was migrated automatically, but the new account has 0 edits and default prefs. Can these be migrated please, if possible? :{{done}} [[User:BTullis (WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:06, 4 October 2024 (UTC) === Al Riaz Uddin Ripon === {{user2|Al Riaz Uddin Ripon}} -> {{user2|RiazACU}} Verification edit is [https://meta.m.wikimedia.org/w/index.php?title=User:RiazACU/test&oldid=27558462 here] - [[User:Al Riaz Uddin Ripon|Al Riaz Uddin ]] ([[User talk:Al Riaz Uddin Ripon|talk]]) 07:56, 5 October 2024 (UTC) :@[[User:Al Riaz Uddin Ripon|Al Riaz Uddin Ripon]] Hi, your SUL verification doesn't mention your old username, is there concern about it? If so, can you send me an email via email to user functionality on-wiki with SUL account and mention the old account please so we can properly verify ownership? Thank you! [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:23, 7 October 2024 (UTC) ::{{re|Ladsgroup}} Please check edit source [https://meta.m.wikimedia.org/w/index.php?title=User:RiazACU/test&oldid=27558462 here] also see previous rename request [https://meta.m.wikimedia.org/w/index.php?title=Special:Log&logid=55487299 logid] - [[User:Al Riaz Uddin Ripon|Al Riaz Uddin ]] ([[User talk:Al Riaz Uddin Ripon|talk]]) 14:30, 7 October 2024 (UTC) :Thanks. {{Done}} now. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:34, 7 October 2024 (UTC) === Chlod Alejandro === {{user2|Chlod Alejandro}} -> {{user2|Chlod}} [https://meta.wikimedia.org/w/index.php?title=User:Chlod/matrix&diff=prev&oldid=27565238] <span style="font-weight: bold; font-style: italic;">[[User:Chlod Alejandro|Chlod]]</span> <span style="font-size: calc(1em - 2pt);">([[User talk:Chlod Alejandro|say hi!]]) (please ping on reply)</span> 18:14, 6 October 2024 (UTC) :@[[User:Chlod|Chlod]] {{Done}}. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 18:36, 6 October 2024 (UTC) === Houseblaster === {{user2|Houseblaster}} -> {{user2|HouseBlaster}} [[:en:Special:Diff/1249831567|confirmation edit at enwiki]]. [[User:Houseblaster|Houseblaster]] ([[User talk:Houseblaster|talk]]) 02:14, 7 October 2024 (UTC) :@[[User:HouseBlaster|HouseBlaster]] {{Done}}. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 04:38, 7 October 2024 (UTC) === Christine Stone === {{user2|Cstone}} -> {{user2|CStone_WMF}} [https://meta.wikimedia.org/w/index.php?title=User:CStone_(WMF)&diff=prev&oldid=27575323 confirmation] :{{done}} [[User:CStone_WMF]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:06, 9 October 2024 (UTC) :According to the note at the top of this page, the new username is supposed to match the user's SUL username. There is no global account for the requested new username. The confirmation edit was instead made using [[m:User:CStone (WMF)]]. Does the requester intend to use the existing SUL username (from the confirmation edit), or to instead keep the accounts separate (and renaming merely to resolve a username conflict with someone else's account)? [[User:PleaseStand|''Please'''''Stand''']] ([[User talk:PleaseStand|talk]]) 07:29, 10 October 2024 (UTC) ::@[[User:PleaseStand|PleaseStand]] You are correct. @[[User:CStone WMF|CStone_WMF]] I need to rename your account to [[User:CStone_(WMF)]]. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:16, 14 October 2024 (UTC) :::Sorry for the confusion, I logged in successfully now! [[User:CStone (WMF)|CStone (WMF)]] ([[User talk:CStone (WMF)|talk]]) 01:49, 18 October 2024 (UTC) === すずねーう === {{user2|すずねーう}} -> {{user2|鈴音雨}} [https://meta.wikimedia.org/w/index.php?title=User%3A%E9%88%B4%E9%9F%B3%E9%9B%A8&diff=27585730&oldid=26719888 confirmation] --[[User:すずねーう|すずねーう]] ([[User talk:すずねーう|talk]]) 01:30, 11 October 2024 (UTC) :{{Done}} @[[User:鈴音雨|鈴音雨]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:15, 14 October 2024 (UTC) === Anzx === {{user2|Anzx}} -> {{user2|~aanzx}} [https://meta.wikimedia.org/w/index.php?title=User:~aanzx&diff=prev&oldid=27615807] , i don't know how but https://wikitech.wikimedia.org/w/index.php?title=Special:Log&logid=954589 seems to be created already.[[User:&#126;aanzx|&#126;aanzx]] ([[User talk:&#126;aanzx|talk]]) 05:11, 17 October 2024 (UTC) ::confirmation from other account:[[User:Anzx|Anzx]] ([[User talk:Anzx|talk]]) 05:12, 17 October 2024 (UTC) :{{Done}} Hi @[[User:~aanzx|~aanzx]] You have been renamed. Please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 11:14, 17 October 2024 (UTC) ::@[[User:Ladsgroup|Ladsgroup]] thank you, logged in successfully.[[User:&#126;aanzx|&#126;aanzx]] ([[User talk:&#126;aanzx|talk]]) 11:16, 17 October 2024 (UTC) === revi === * {{User2|Revi}} -> {{User2|-revi}} I'm sad I have to get the <code>-</code> in my username but that's how it is. Confirmation from SUL coming shortly. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:07, 26 October 2024 (UTC) :Also, This page should rather do <code><nowiki>__NEWSECTIONLINK__</nowiki></code>. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:07, 26 October 2024 (UTC) :Here we go. &mdash; [[User:-revi|<span style="color:green">레비</span>]][[User talk:-revi|<span style="color:green"><small>Revi</small></span>]] 11:08, 26 October 2024 (UTC) :Diffs: [[Special:Diff/2239078|request]] and [[Special:Diff/2239079|confirmation]]. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:10, 26 October 2024 (UTC) :@[[User:-revi|-revi]], {{done}}. You should be able to unify your account now. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 11:19, 26 October 2024 (UTC) ::<code>The account was migrated to the unified account.</code> Thanks. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:20, 26 October 2024 (UTC) === Stimoroll === {{user2|Stimoroll}} -> {{user2|KrzysztofPoplawski}} === Robert Timm (WMDE) === {{user2|Robert Timm}} -> {{user2|Robert Timm (WMDE)}} [https://meta.wikimedia.org/w/index.php?title=User%3ARobert_Timm_%28WMDE%29&diff=27702429&oldid=25853034 Confirmation edit on Meta] [[User:Robert Timm|Robert Timm]] ([[User talk:Robert Timm|talk]]) 08:33, 4 November 2024 (UTC) :Hi @[[User:Robert Timm (WMDE)|Robert Timm (WMDE)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:15, 4 November 2024 (UTC) === Hex === {{user2|Hex}} -> {{user2|Scott}} [https://en.wikipedia.org/w/index.php?title=Wikipedia:Changing_username/Usurpations&diff=prev&oldid=1256529095&diffonly=1 Confirmation edit on enwp]. See also {{phab|379483}}. [[User:Hex|Hex]] ([[User talk:Hex|talk]]) 10:30, 10 November 2024 (UTC) :Done, can you try merging your wikitech account in your global account now? [[User:Zabe|Zabe]] ([[User talk:Zabe|talk]]) 13:14, 10 November 2024 (UTC) qhrafxp0ammpzpm0nrd8yfm02se4ep1 2243213 2243200 2024-11-10T23:27:53Z Zabe 21763 /* Hex */ 2243213 wikitext text/x-wiki Users can '''request a username change to match their SUL username'''. They must prove they are the same person by confirming the rename on wiki in both places: wikitech and one of SUL wikis. Renames are done by [[Wikitech:Bureaucrats|bureaucrats]] using [[Special:RenameUser]]. Actions are logged at [[Special:Log/renameuser]]. == Requests == <pre> === Example === {{user2|Foo}} -> {{user2|Bar}} [link to confirmation edit on a SUL wiki] ~~~~ </pre> === Jgiannelos === {{user2|Jgiannelos}} -> {{user2|JGiannelos (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:JGiannelos_(WMF)&diff=prev&oldid=27611127 Confirmation edit on Meta] [[User:Jgiannelos|Jgiannelos]] ([[User talk:Jgiannelos|talk]]) 09:28, 16 October 2024 (UTC) :{{Done}} @[[User:JGiannelos (WMF)|JGiannelos (WMF)]], your legacy Jgiannelos has been renamed. I also needed to rename [[User:JGiannelos (WMF) (usurped)]] to move it out of the way of unifying your account names. Please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. -- [[User:BryanDavis|BryanDavis]] ([[User talk:BryanDavis|talk]]) 17:45, 16 October 2024 (UTC) === LSobanski === {{user2|LSobanski}} -> {{user2|LSobanski (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3ALSobanski_%28WMF%29&diff=27578184&oldid=21985507 Confirmation edit on Meta] * {{done}} [[User:LSobanski (WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. --[[User:BryanDavis|BryanDavis]] ([[User talk:BryanDavis|talk]]) 18:18, 9 October 2024 (UTC) === Alangi Derick === {{user2|Alangi Derick}} -> {{user2|X-Savitar}} Not sure if this is an account merge or rename. But both are the same users and would like to transfer my edits from the former to the later. Conformation [[:meta:Special:Diff/27572366]]. Thank you! :{{Done}} [[User:X-Savitar]] You are renamed here, please login with the old password but new username (if it doesn't work, use Special:PasswordReset to reset it) and then once logged in, use Special:MergeAccount to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:34, 8 October 2024 (UTC) ::Thank you very much @[[User:Ladsgroup|Ladsgroup]]. I appreciate, everything looks good now. 🙏🏽 -- [[User:X-Savitar|X-Savitar]] ([[User talk:X-Savitar|talk]]) 18:51, 8 October 2024 (UTC) === Ahmon Dancy === {{user2|Ahmon Dancy}} -> {{user2|ADancy_(WMF)}} Confirmation [[:meta:Special:Diff/27570215]] Please and thank you. :{{Done}} [[User:ADancy_(WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use Special:PasswordReset to reset it) and then once logged in, use Special:MergeAccount to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:36, 8 October 2024 (UTC) === Launchpad === {{user2|Launchpad}} -> {{user2|Launchpad555}} :My confirmation: [[:meta:Special:Diff/27519615]]. ::{{Done}} [[User:Launchpad555]]: Renamed, please use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:44, 1 October 2024 (UTC) === NMW03 === {{user2|NMW03}} -> {{user2|Nemoralis}} :My confirmation: [[:meta:Special:Diff/27480283]]. Is it possible to change shell username? [[User:NMW03|NMW03]] ([[User talk:NMW03|talk]]) 16:19, 18 September 2024 (UTC) :@[[User:NMW03|NMW03]] Renaming shell username is not currently possible. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:56, 24 September 2024 (UTC) :{{Done}} @[[User:Nemoralis|Nemoralis]]: Done, please use [[Special:MergeAccount]] to unify [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:45, 1 October 2024 (UTC) :: Thanks! [[User:Nemoralis|Nemoralis]] ([[User talk:Nemoralis|talk]]) 10:34, 2 October 2024 (UTC) === Massslywmde === {{user2|Massslywmde}} -> {{user2|Mohammed Abdulai (WMDE)}} :My confirmation: [[:meta:Special:Diff/27493737]]. -[[User:Massslywmde|Massslywmde]] ([[User talk:Massslywmde|talk]]) 12:21, 21 September 2024 (UTC) :@[[User:Mohammed Abdulai (WMDE)|Mohammed Abdulai (WMDE)]] {{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:32, 1 October 2024 (UTC) ::Please [[Special:MergeAccount]] to unify your wikitech and SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:33, 1 October 2024 (UTC) === Samtar === {{user2|Samtar}} -> {{user2|TheresNoTime}} :[[:mw:Special:Diff/6768644|Confirmation]], but am now realising that I had created {{u|TheresNoTime}} and got it locked per [[:phab:T302109|all this fun]], so maybe it's not going to be worth risking it (: any thoughts? [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 10:09, 23 September 2024 (UTC) :@[[User:Samtar|Samtar]] That's easy, we can rename TheresNoTime to "TheresNoTime (usurped)" and then rename this account to that. Would that be fine? [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:56, 24 September 2024 (UTC) ::{{re|Ladsgroup}} oh yeah! That'd be perfect, thank you :-) [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 11:31, 24 September 2024 (UTC) :::{{re|Samtar}} Wouldn't it make sense to rename/link the account to one of your [[m:User:TheresNoTime/disclosure#Alts.|SUL alt accounts]] and then unblock to avoid having another SUL account? --[[User:Nintendofan885|Nintendofan885]] ([[User talk:Nintendofan885|talk]]) 18:22, 24 September 2024 (UTC) ::::Don't want to make things even more complex — I'll just let Ladsgroup do as suggested. Thanks for the suggestion though :-) -- [[User:Samtar|Samtar]] ([[User talk:Samtar|talk]]) 13:05, 1 October 2024 (UTC) :{{Done}} @[[User:TheresNoTime|TheresNoTime]] Done now. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 18:14, 1 October 2024 (UTC) === Zoranzoki21 === {{user2|Zoranzoki21}} -> {{user2|Kizule (usurped)}} Per [[phab:T260647]]. I have <code>Kizule (usurped)</code> already available on Wikimedia's wikis, so SUL won't be an issue. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 22:54, 24 September 2024 (UTC) :{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:21, 2 October 2024 (UTC) ::It looks like that I made a mistake. Kizule (usurped) is actually someone else's account (in the past when my username was Zoranzoki21 and I wanted to rename account to Kizule to reflect my real-life nickname, Kizule was unavailable, therefore stewards moved it so they can "make a place". ::Can you rename Kizule (usurped) to Kizule (test) which is trully mine, and I can confirm that it's mine for real, therefore finish the process of merging via Special:MergeAccount? [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:40, 7 October 2024 (UTC) :::Actually.. I'm placing this on a hold, I want to rename Kizule (test) on WMF's wikis, therefore this has to wait. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:41, 7 October 2024 (UTC) ::::I made a request to get Kizule (test) renamed to Kizule2. Can you rename Kizule (usurped) to Kizule2? Once it's renamed on other SUL wikis as well, I'll use Special:MergeAccount and complete my part of the process. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 17:46, 7 October 2024 (UTC) :::::Actually... Sorry for confusion. Kizule2 is actually unavailable, and it's a bug that it allowed me to ask for <code>Kizule 2</code> when <code>Kizule2</code> exists. :::::Okay, let's just rename Kizule (usurped) to Kizule (test) and finish this for once. [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 18:27, 7 October 2024 (UTC) ::::::{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:10, 8 October 2024 (UTC) :::::::Thank you so much, my part is done as well! :) [[User:Kizule|Kizule]] ([[User talk:Kizule|talk]]) 13:59, 8 October 2024 (UTC) === Nhatminh01 === {{user2|Nhatminh01}} -> {{user2|JrandWP}} Per [[mw:Topic:Wmjt52wwyz7rssiz]]. [[User:Nhatminh01|Nhatminh01]] ([[User talk:Nhatminh01|talk]]) 09:43, 1 October 2024 (UTC) :{{Not done}} @[[User:JrandWP|JrandWP]] Now that you made an account, I can't rename your old one. It has to stay. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 13:51, 1 October 2024 (UTC) === Jcrespo === {{user2|Jcrespo}} -> {{user2|JCrespo (WMF)}} See [https://en.wikipedia.org/w/index.php?title=User%3AJCrespo_%28WMF%29&diff=1248772128&oldid=1233297733 confirmation] and [https://phabricator.wikimedia.org/p/jcrespo/ phab profile] -- [[User:Jcrespo|Jcrespo]] 11:43, 1 October 2024 (UTC) :@[[User:JCrespo (WMF)|JCrespo (WMF)]]: {{Done}} please use [[Special:MergeAccount]] to unify your accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:38, 1 October 2024 (UTC) ::Thank you! All good. -- [[User:JCrespo (WMF)|JCrespo (WMF)]] ([[User talk:JCrespo (WMF)|talk]]) 13:07, 1 October 2024 (UTC) === Arturo Borrero Gonzalez === {{user2|Arturo Borrero Gonzalez}} -> {{user2|ABorrero (WMF)}} See [https://meta.wikimedia.org/w/index.php?title=User%3AABorrero_%28WMF%29&diff=27540035&oldid=27159444 confirmation] and [https://phabricator.wikimedia.org/p/aborrero/ phab profile ] [[User:ABorrero (WMF)|ABorrero (WMF)]] ([[User talk:ABorrero (WMF)|talk]]) 14:51, 1 October 2024 (UTC) :{{done}} [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:44, 2 October 2024 (UTC) === David Caro === {{user2|David Caro}} -> {{user2|DCaro (WMF)}} See [https://meta.wikimedia.org/w/index.php?title=User:DCaro_(WMF)&oldid=27540174 confirmation] and [https://phabricator.wikimedia.org/p/dcaro/ phab profile ] [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:18, 1 October 2024 (UTC) :@[[User:DCaro (WMF)|DCaro (WMF)]]: {{Done}}, please login with the new username and old password and use [[Special:MergeAccount]] to unify with SUL. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:14, 2 October 2024 (UTC) ::It worked, thanks! [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 07:51, 14 October 2024 (UTC) === Bartosz Dziewoński === {{user2|Bartosz Dziewoński}} -> {{user2|Matma Rex}} See https://phabricator.wikimedia.org/p/matmarex/ as confirmation, which is linked to both accounts. Unfortunately the "Matma Rex" account has been already created here automatically when I visited the wiki, so it will have to be renamed away first. [[User:Bartosz Dziewoński|Bartosz Dziewoński]] ([[User talk:Bartosz Dziewoński|talk]]) 19:04, 1 October 2024 (UTC) :@[[User:Matma Rex]] {{Done}} now. Now you can unify your accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 21:02, 1 October 2024 (UTC) ::Done, thanks! [[User:Matma Rex|Matma Rex]] ([[User talk:Matma Rex|talk]]) 14:07, 2 October 2024 (UTC) === RLazarus === {{user2|RLazarus}} -> {{user2|RLazarus (WMF)}} ([https://meta.wikimedia.org/w/index.php?title=User:RLazarus_(WMF)&diff=prev&oldid=27541361 confirmation]) As discussed with [[User:Ladsgroup|Ladsgroup]] on IRC, I incorrectly Special:MergeAccounts'd the RLazarus_(WMF) account here but haven't edited with it. [[User:RLazarus|RLazarus]] ([[User talk:RLazarus|talk]]) 21:19, 1 October 2024 (UTC) :{{Done}} @[[User:RLazarus (WMF)|RLazarus]] You have been renamed. Please use the password of the old account and then go to [[Special:MergeAccount]] [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:27, 2 October 2024 (UTC) ::Thank you! [[User:RLazarus (WMF)|RLazarus (WMF)]] ([[User talk:RLazarus (WMF)|talk]]) 15:26, 2 October 2024 (UTC) === JMeybohm === {{user2|JMeybohm}} -> {{user2|JMeybohm_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3AJMeybohm_%28WMF%29&diff=27543428&oldid=21557650 confirmation] [[User:JMeybohm|JMeybohm]] ([[User talk:JMeybohm|talk]]) 08:15, 2 October 2024 (UTC) :@[[User:JMeybohm (WMF)|JMeybohm (WMF)]] Done. Please try logging in with the new username but old password. Then go to [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:00, 2 October 2024 (UTC) ::{{done}} - Thanks! [[User:JMeybohm (WMF)|JMeybohm (WMF)]] ([[User talk:JMeybohm (WMF)|talk]]) 10:07, 2 October 2024 (UTC) === Volans === {{user2|Volans}} -> {{user2|RCoccioli (WMF)}} [https://meta.wikimedia.org/w/index.php?title=User%3ARCoccioli_%28WMF%29&diff=27545646&oldid=21479836 confirmation edit] [[User:Volans|Volans]] ([[User talk:Volans|talk]]) 10:24, 2 October 2024 (UTC) :@[[User:RCoccioli (WMF)|RCoccioli (WMF)]] Done now. Login with your new username but old password. Then use [[Special:MergeAccount]] to unify. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:44, 2 October 2024 (UTC) ::{{done}} - Thanks! [[User:RCoccioli (WMF)|RCoccioli (WMF)]] ([[User talk:RCoccioli (WMF)|talk]]) 10:56, 2 October 2024 (UTC) === Clément Goubert === {{user2|Clément Goubert}} -> {{user2|CGoubert-WMF}} [https://meta.wikimedia.org/w/index.php?title=User:CGoubert-WMF&oldid=27545717 confirmation edit] [[User:Clément Goubert|Clément Goubert]] ([[User talk:Clément Goubert|talk]]) 11:00, 2 October 2024 (UTC) :@[[User:CGoubert-WMF|CGoubert-WMF]]: Done, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:11, 2 October 2024 (UTC) ::{{done}} thank you! [[User:CGoubert-WMF|CGoubert-WMF]] ([[User talk:CGoubert-WMF|talk]]) 09:23, 3 October 2024 (UTC) === Dom Walden === {{user2|Dom_Walden}} -> {{user2|DWalden_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:DWalden_(WMF)&diff=prev&oldid=27546037 link to confirmation edit on a SUL wiki] [[User:Dom Walden|Dom Walden]] ([[User talk:Dom Walden|talk]]) 11:55, 2 October 2024 (UTC) :@[[User:DWalden (WMF)|DWalden (WMF)]]: {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:17, 2 October 2024 (UTC) === Francesco Negri === {{user2|FNegri}} -> {{user2|FNegri-WMF}} [https://meta.wikimedia.org/w/index.php?diff=27546410 confirmation edit] (Note: I mistakenly already logged in to Wikitech with my SUL account, sorry about that!) [[User:FNegri|FNegri]] ([[User talk:FNegri|talk]]) 14:03, 2 October 2024 (UTC) :@[[User:FNegri-WMF|FNegri-WMF]] {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 16:20, 2 October 2024 (UTC) === Dreamrimmer === {{user2|Dreamrimmer}} -> {{user2|DreamRimmer}} [https://meta.wikimedia.org/w/index.php?title=User:DreamRimmer&diff=prev&oldid=27551054 link to confirmation edit on a SUL wiki] [[User:Dreamrimmer|Dreamrimmer]] ([[User talk:Dreamrimmer|talk]]) 13:52, 3 October 2024 (UTC) :@[[User:DreamRimmer|DreamRimmer]] {{Done}}, please login with the new username but old password and then use [[Special:MergeAccount]] to unify. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 15:38, 3 October 2024 (UTC) === Rexo === {{user2|Remagoxer}} -> {{user2|Rexogamer}} [https://en.wikipedia.org/w/index.php?title=User:Rexogamer&diff=prev&oldid=1249325350 confirmation on enwiki]. note that I accidentally created an account here yesterday, so that should be renamed first. [[User:Remagoxer|Remagoxer]] ([[User talk:Remagoxer|talk]]) 10:24, 4 October 2024 (UTC) :{{Done}} [[User:Rexogamer]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:09, 4 October 2024 (UTC) === Ben Tullis === {{user2|Btullis}} -> {{user2|BTullis_(WMF)}} [https://meta.wikimedia.org/w/index.php?title=User:BTullis_(WMF)/RAID&oldid=24240616] [[User:BTullis (WMF)|BTullis (WMF)]] ([[User talk:BTullis (WMF)|talk]]) 10:26, 4 October 2024 (UTC) It seems that I was migrated automatically, but the new account has 0 edits and default prefs. Can these be migrated please, if possible? :{{done}} [[User:BTullis (WMF)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 12:06, 4 October 2024 (UTC) === Al Riaz Uddin Ripon === {{user2|Al Riaz Uddin Ripon}} -> {{user2|RiazACU}} Verification edit is [https://meta.m.wikimedia.org/w/index.php?title=User:RiazACU/test&oldid=27558462 here] - [[User:Al Riaz Uddin Ripon|Al Riaz Uddin ]] ([[User talk:Al Riaz Uddin Ripon|talk]]) 07:56, 5 October 2024 (UTC) :@[[User:Al Riaz Uddin Ripon|Al Riaz Uddin Ripon]] Hi, your SUL verification doesn't mention your old username, is there concern about it? If so, can you send me an email via email to user functionality on-wiki with SUL account and mention the old account please so we can properly verify ownership? Thank you! [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:23, 7 October 2024 (UTC) ::{{re|Ladsgroup}} Please check edit source [https://meta.m.wikimedia.org/w/index.php?title=User:RiazACU/test&oldid=27558462 here] also see previous rename request [https://meta.m.wikimedia.org/w/index.php?title=Special:Log&logid=55487299 logid] - [[User:Al Riaz Uddin Ripon|Al Riaz Uddin ]] ([[User talk:Al Riaz Uddin Ripon|talk]]) 14:30, 7 October 2024 (UTC) :Thanks. {{Done}} now. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:34, 7 October 2024 (UTC) === Chlod Alejandro === {{user2|Chlod Alejandro}} -> {{user2|Chlod}} [https://meta.wikimedia.org/w/index.php?title=User:Chlod/matrix&diff=prev&oldid=27565238] <span style="font-weight: bold; font-style: italic;">[[User:Chlod Alejandro|Chlod]]</span> <span style="font-size: calc(1em - 2pt);">([[User talk:Chlod Alejandro|say hi!]]) (please ping on reply)</span> 18:14, 6 October 2024 (UTC) :@[[User:Chlod|Chlod]] {{Done}}. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 18:36, 6 October 2024 (UTC) === Houseblaster === {{user2|Houseblaster}} -> {{user2|HouseBlaster}} [[:en:Special:Diff/1249831567|confirmation edit at enwiki]]. [[User:Houseblaster|Houseblaster]] ([[User talk:Houseblaster|talk]]) 02:14, 7 October 2024 (UTC) :@[[User:HouseBlaster|HouseBlaster]] {{Done}}. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 04:38, 7 October 2024 (UTC) === Christine Stone === {{user2|Cstone}} -> {{user2|CStone_WMF}} [https://meta.wikimedia.org/w/index.php?title=User:CStone_(WMF)&diff=prev&oldid=27575323 confirmation] :{{done}} [[User:CStone_WMF]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 10:06, 9 October 2024 (UTC) :According to the note at the top of this page, the new username is supposed to match the user's SUL username. There is no global account for the requested new username. The confirmation edit was instead made using [[m:User:CStone (WMF)]]. Does the requester intend to use the existing SUL username (from the confirmation edit), or to instead keep the accounts separate (and renaming merely to resolve a username conflict with someone else's account)? [[User:PleaseStand|''Please'''''Stand''']] ([[User talk:PleaseStand|talk]]) 07:29, 10 October 2024 (UTC) ::@[[User:PleaseStand|PleaseStand]] You are correct. @[[User:CStone WMF|CStone_WMF]] I need to rename your account to [[User:CStone_(WMF)]]. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:16, 14 October 2024 (UTC) :::Sorry for the confusion, I logged in successfully now! [[User:CStone (WMF)|CStone (WMF)]] ([[User talk:CStone (WMF)|talk]]) 01:49, 18 October 2024 (UTC) === すずねーう === {{user2|すずねーう}} -> {{user2|鈴音雨}} [https://meta.wikimedia.org/w/index.php?title=User%3A%E9%88%B4%E9%9F%B3%E9%9B%A8&diff=27585730&oldid=26719888 confirmation] --[[User:すずねーう|すずねーう]] ([[User talk:すずねーう|talk]]) 01:30, 11 October 2024 (UTC) :{{Done}} @[[User:鈴音雨|鈴音雨]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 09:15, 14 October 2024 (UTC) === Anzx === {{user2|Anzx}} -> {{user2|~aanzx}} [https://meta.wikimedia.org/w/index.php?title=User:~aanzx&diff=prev&oldid=27615807] , i don't know how but https://wikitech.wikimedia.org/w/index.php?title=Special:Log&logid=954589 seems to be created already.[[User:&#126;aanzx|&#126;aanzx]] ([[User talk:&#126;aanzx|talk]]) 05:11, 17 October 2024 (UTC) ::confirmation from other account:[[User:Anzx|Anzx]] ([[User talk:Anzx|talk]]) 05:12, 17 October 2024 (UTC) :{{Done}} Hi @[[User:~aanzx|~aanzx]] You have been renamed. Please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. Thank you [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 11:14, 17 October 2024 (UTC) ::@[[User:Ladsgroup|Ladsgroup]] thank you, logged in successfully.[[User:&#126;aanzx|&#126;aanzx]] ([[User talk:&#126;aanzx|talk]]) 11:16, 17 October 2024 (UTC) === revi === * {{User2|Revi}} -> {{User2|-revi}} I'm sad I have to get the <code>-</code> in my username but that's how it is. Confirmation from SUL coming shortly. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:07, 26 October 2024 (UTC) :Also, This page should rather do <code><nowiki>__NEWSECTIONLINK__</nowiki></code>. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:07, 26 October 2024 (UTC) :Here we go. &mdash; [[User:-revi|<span style="color:green">레비</span>]][[User talk:-revi|<span style="color:green"><small>Revi</small></span>]] 11:08, 26 October 2024 (UTC) :Diffs: [[Special:Diff/2239078|request]] and [[Special:Diff/2239079|confirmation]]. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:10, 26 October 2024 (UTC) :@[[User:-revi|-revi]], {{done}}. You should be able to unify your account now. [[User:Taavi|Taavi]] ([[User talk:Taavi|talk!]]) 11:19, 26 October 2024 (UTC) ::<code>The account was migrated to the unified account.</code> Thanks. &mdash; [[User:Revi|<span style="color:green">레비</span>]][[User talk:Revi|<span style="color:green"><small>Revi</small></span>]] 11:20, 26 October 2024 (UTC) === Stimoroll === {{user2|Stimoroll}} -> {{user2|KrzysztofPoplawski}} === Robert Timm (WMDE) === {{user2|Robert Timm}} -> {{user2|Robert Timm (WMDE)}} [https://meta.wikimedia.org/w/index.php?title=User%3ARobert_Timm_%28WMDE%29&diff=27702429&oldid=25853034 Confirmation edit on Meta] [[User:Robert Timm|Robert Timm]] ([[User talk:Robert Timm|talk]]) 08:33, 4 November 2024 (UTC) :Hi @[[User:Robert Timm (WMDE)|Robert Timm (WMDE)]] You are renamed here, please login with the old password but new username (if it doesn't work, use [[Special:PasswordReset]] to reset it) and then once logged in, use [[Special:MergeAccount]] to unify with the rest SUL accounts. [[User:Ladsgroup|Ladsgroup]] ([[User talk:Ladsgroup|talk]]) 14:15, 4 November 2024 (UTC) === Hex === {{user2|Hex}} -> {{user2|Scott}} [https://en.wikipedia.org/w/index.php?title=Wikipedia:Changing_username/Usurpations&diff=prev&oldid=1256529095&diffonly=1 Confirmation edit on enwp]. See also {{phab|379483}}. [[User:Hex|Hex]] ([[User talk:Hex|talk]]) 10:30, 10 November 2024 (UTC) :{{Done}}, can you try merging your wikitech account in your global account now? [[User:Zabe|Zabe]] ([[User talk:Zabe|talk]]) 13:14, 10 November 2024 (UTC) am4l1joowk6pw2wan4ndb7hfi4gnjbu Nova Resource:Tofuinfratest-ed3bbbab-615a-46ae-897a-86d68668e354 498 456111 2243194 2024-11-10T12:00:17Z Labslogbot 55 Auto update of instance info. 2243194 wikitext text/x-wiki <!-- autostatus begin --> {{Nova Resource |Resource Type=project |Project ID=dace440d59b04ab1bbb91fd3a326af46 |Project Name=tofuinfratest-ed3bbbab-615a-46ae-897a-86d68668e354}} <!-- autostatus end --> 4ksxl31r56byiubv9w7pzydr8zn1t2p User:Scott 2 456112 2243204 2024-11-10T18:13:36Z DreamRimmer 37491 DreamRimmer moved page [[User:Scott]] to [[User:Hex]]: Automatically moved page while renaming the user "[[Special:CentralAuth/Scott|Scott]]" to "[[Special:CentralAuth/Hex|Hex]]" 2243204 wikitext text/x-wiki #REDIRECT [[User:Hex]] 6ehj4q77wb1ay3mdim9v1pz7ooe8ast Nova Resource:5238620953c040e7a9effbe47d4e0932/SAL 498 456113 2243206 2024-11-10T18:23:01Z Stashbot 7414 physikerwelt: start setting up puppet Project_puppetserver attempt 2 -> T379501 2243206 wikitext text/x-wiki === 2024-11-10 === * 18:23 physikerwelt: start setting up puppet Project_puppetserver attempt 2 -> [[phab:T379501|T379501]] <noinclude>[[Category:SAL]]</noinclude> gbdqg12vc8au5yjlngjtvrieywd26ua User talk:Essaidib2 3 456114 2243243 2024-11-11T10:43:44Z StrikerBot 8475 Welcome to Toolforge! 2243243 wikitext text/x-wiki == Welcome to Toolforge! == Hello Essaidib2, welcome to the Toolforge project! Your request for access was processed, and you should be able to use ssh to connect to <tt>login.toolforge.org</tt>. You will need to logout and login again at https://toolsadmin.wikimedia.org/ to activate your new permissions there. Check the [[Help:Toolforge|Toolforge help page]] for tips on using your account. You can also ask questions in our IRC channel at {{irc|wikimedia-cloud}} or send an e-mail to our mailing list <tt>cloud@lists.wikimedia.org</tt>. Thank you, and have fun making Tools! --[[User:StrikerBot|StrikerBot]] ([[User talk:StrikerBot|talk]]) 10:43, 11 November 2024 (UTC) rw2hzdr26r5zjdbysmfn3hhu00icyay User talk:Bunnypranav 3 456115 2243244 2024-11-11T10:44:02Z StrikerBot 8475 Welcome to Toolforge! 2243244 wikitext text/x-wiki == Welcome to Toolforge! == Hello Bunnypranav, welcome to the Toolforge project! Your request for access was processed, and you should be able to use ssh to connect to <tt>login.toolforge.org</tt>. You will need to logout and login again at https://toolsadmin.wikimedia.org/ to activate your new permissions there. Check the [[Help:Toolforge|Toolforge help page]] for tips on using your account. You can also ask questions in our IRC channel at {{irc|wikimedia-cloud}} or send an e-mail to our mailing list <tt>cloud@lists.wikimedia.org</tt>. Thank you, and have fun making Tools! --[[User:StrikerBot|StrikerBot]] ([[User talk:StrikerBot|talk]]) 10:44, 11 November 2024 (UTC) sb3qtafubgz5rboezk8ij2m7egj98g5